Preprocessing method of text sentiment analysis characteristic verification

A sentiment analysis and pre-processing technology, which is applied in special data processing applications, text database clustering/classification, electronic digital data processing, etc., can solve the problems of insufficient consideration, low accuracy, and easy misclassification of artificially constructed sentiment analysis corpus and other issues to achieve the effect of improving accuracy, good versatility and scalability, and ensuring the richness of information

Active Publication Date: 2016-04-06
科来网络技术股份有限公司
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] 1. For general applicable scenarios, the business field of sentiment analysis is not deep enough, especially the emotional information on the Internet has an obvious unbalanced corpus distribution, and it is easy to misclassify when artificially constructing a sentiment analysis corpus. These two issues are generally insufficiently considered;
[0004] 2. The general verification and comparison basis for different feature extraction algorithms lacks the best practice design for practical applications
[0006] 1. The accuracy rate of most existing Chinese sentiment analysis algorithms is low, and there is a lack of feature verification or feature

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Preprocessing method of text sentiment analysis characteristic verification
  • Preprocessing method of text sentiment analysis characteristic verification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] A preprocessing method for text sentiment analysis feature verification, comprising the following steps:

[0033] 1. Preprocessing the original training set to obtain preprocessing information:

[0034] Such as figure 1 As shown, this step includes the following specific content:

[0035] 1.1. Perform a summary analysis on the original training set, and the output result is recorded as sample_struct:

[0036] (1) Judging whether the total number of samples is large enough: the judgment result is represented by the parameter sample_size. For sentiment analysis samples, if the sample is large enough, it means that the number of unique samples of each effective classification is more than 1000.

[0037] (2) Judging whether the distribution of emotional classification is balanced: the judgment result is represented by the parameter sample_dist, which includes the number of samples of different classifications. If the number of samples of different classifications is not m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a preprocessing method of text sentiment analysis characteristic verification. The preprocessing method comprises the following steps: preprocessing an original training set to obtain preprocessing information, and then, carrying out characteristic verification and characteristic selection on preprocessing information, wherein the step of preprocessing the original training set to obtain the preprocessing information comprises the following specific steps: determining the summary of the original training set, determining the summary of an original characteristic vector set, and expanding original data so as to construct integrated processing information. The preprocessing method has the active effects that analysis information is generated from two aspects of training set and characteristic vector, guarantees the rich degree of the information of an evaluation conclusion, and is simultaneously favorable for improving the accuracy of the whole sentiment analysis flow. The preprocessing method exhibits good universality and expandability, and can perform a good effect on various different modeling and implementation sentiment analysis algorithms.

Description

technical field [0001] The invention belongs to the field of text sentiment analysis, and in particular relates to a pre-processing method for feature verification of text sentiment analysis. Background technique [0002] Existing text classification feature selection and verification schemes have achieved good results in content domain classification, but they have the following problems when applied to the field of sentiment analysis: [0003] 1. For general applicable scenarios, the business field of sentiment analysis is not deep enough, especially the emotional information on the Internet has an obvious unbalanced corpus distribution, and it is easy to misclassify when artificially constructing a sentiment analysis corpus. These two issues are generally insufficiently considered; [0004] 2. The general verification and comparison basis for different feature extraction algorithms lacks the best practice design for practical applications. For example, in the commonly u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/374
Inventor 罗鹰张鑫阳林康
Owner 科来网络技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products