Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Pre-processing method for verification of text emotion analysis characteristics

A sentiment analysis and pre-processing technology, which is applied in special data processing applications, text database clustering/classification, electronic digital data processing, etc., can solve the problem of low accuracy, insufficient consideration, and easy misclassification of artificially constructed sentiment analysis corpus and other issues, to achieve good versatility and scalability, improve accuracy, and ensure the effect of richness of information

Inactive Publication Date: 2018-06-29
QINGDAO XIANGZHI ELECTRONICS TECH CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] 1. For general applicable scenarios, the business field of sentiment analysis is not deep enough, especially the emotional information on the Internet has an obvious unbalanced corpus distribution, and it is easy to misclassify when artificially constructing a sentiment analysis corpus. These two issues are generally insufficiently considered;
[0004] 2. The general verification and comparison basis for different feature extraction algorithms lacks the best practice design for practical applications
[0006] 1. The accuracy rate of most existing Chinese sentiment analysis algorithms is low, and there is a lack of feature verification or feature selection schemes that can guide algorithm improvement. For example, according to the results of the 5th Chinese Tendency Analysis and Evaluation Symposium COAE2013, the accuracy rate is generally 60% %about;
[0007] 2. Text information is represented by feature vectors. However, due to the lack of recognized best practices in the field, the modeling and algorithms of sentiment analysis are also varied. Therefore, the recognition and verification scheme for feature vectors of sentiment analysis also needs to consider the respective characteristics of these algorithms and models , such as commonly used bag of words, n-gram, word2vec and other models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] A preprocessing method for text sentiment analysis feature verification, comprising the following steps:

[0030] 1. Preprocessing the original training set to obtain preprocessing information:

[0031] This step includes the following specific content:

[0032] 1.1. Perform a summary analysis on the original training set, and the output result is recorded as sample_struct:

[0033] (1) Judging whether the total number of samples is large enough: the judgment result is represented by the parameter sample_size. For sentiment analysis samples, if the sample is large enough, it means that the number of unique samples of each effective classification is more than 1000.

[0034] (2) Judging whether the distribution of emotional classification is balanced: the judgment result is represented by the parameter sample_dist, which includes the number of samples of different classifications. If the number of samples of different classifications is not much different, it is balance...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a pre-processing method for verification of text emotion analysis characteristics. The method comprises the steps that an original training set is pre-processed to obtain pre-processed information, a summary of the original training set is determined, a summary of an original characteristic vector set is determined, original data is augmented, and accordingly the integratedpre-processed information is established; then, the pre-processed information is subjected to characteristic verification and characteristic selection. The pre-processing method for the verificationof the text emotion analysis characteristics has the advantages that analysis information is generated from the training sets and the characteristic vectors, the information richness degree of an assessment result is guaranteed, and meanwhile the method also helps to improve the accuracy of the whole emotion analysis process. The pre-processing method further has good universality and extendibility and can have good effects on various different modeling and emotion analysis implementation algorithms.

Description

technical field [0001] The invention belongs to the field of text sentiment analysis, and in particular relates to a pre-processing method for feature verification of text sentiment analysis. Background technique [0002] Existing text classification feature selection and verification schemes have achieved good results in content domain classification, but they have the following problems when applied to the field of sentiment analysis: [0003] 1. For general applicable scenarios, the business field of sentiment analysis is not deep enough, especially the emotional information on the Internet has an obvious unbalanced corpus distribution, and it is easy to misclassify when artificially constructing a sentiment analysis corpus. These two issues are generally insufficiently considered; [0004] 2. The general verification and comparison basis for different feature extraction algorithms lacks the best practice design for practical applications. For example, in the commonly u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 不公告发明人
Owner QINGDAO XIANGZHI ELECTRONICS TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products