Classifier construction method and device as well as Chinese text sentiment classification method and system

A construction method and classifier technology, which is applied in the direction of instruments, special data processing applications, electrical digital data processing, etc., can solve the problems of high dependence on application fields and long construction time of classifiers, so as to avoid labor costs and shorten construction Time, the effect of improving the accuracy rate

Inactive Publication Date: 2013-04-03
SUZHOU UNIV
4 Cites 12 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0005] In view of this, the present invention provides a classifier construction method and device, a Chinese text sentiment classification method and system,...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention provides a classifier construction method and device as well as a Chinese text sentiment classification method and system. The classification method comprises the following steps of: obtaining a sample to be labeled from a sample set to be labeled; looking up sentiment words in the sample to be labeled; obtaining the sentiment polarity of each sentiment word; converting the sentiment polarity of the sentiment words of which the sentiment polarity conforms to a sentiment polarity conversion rule in the sample to be labeled; counting the amount of the sentiment words of which the sentiment polarity is negative and positive in the sample to be labeled; according to the amount of the sentiment words of which the sentiment polarity is positive and the amount of the sentiment words of which the sentiment polarity is negative, determining the sentiment polarity of the sample to be labeled to obtain a labeled sample; according to the labeled sample, labeling other samples to be labeled in the sample set to be labeled to obtain a labeled sample set; constructing a maximum entropy classifier by the labeled sample set; and classifying a Chinese text to be classified by the maximum entropy classifier. According to the method, the device and the system provided by the invention, the Chinese text classification time is shortened, and the classification accuracy is improved.

Application Domain

Technology Topic

Image

  • Classifier construction method and device as well as Chinese text sentiment classification method and system
  • Classifier construction method and device as well as Chinese text sentiment classification method and system
  • Classifier construction method and device as well as Chinese text sentiment classification method and system

Examples

  • Experimental program(3)
  • Effect test(1)

Example

[0068] Example 1: I don't like this product.
[0069] In the sentence of example 1, if the emotional word is "like" and the negative keyword "no" appears in the sentence, the emotional polarity of the emotional word "like" is changed, that is, the emotional polarity of "like" is changed from positive turn negative.

Example

[0070] Example 2: I like the idea of ​​this product, but the quality is not acceptable to me.
[0071] In the sentence of Example 2, if the emotional word is "like", and the transition keyword "but" appears in the next sentence of the sentence it is in, the emotional polarity of the emotional word "like" is changed, that is, the emotion of "like" The polarity changes from positive to negative.

Example

[0072] Example 3: It would be nice if the color was red.
[0073] In the sentence of Example 3, if the emotional word is "good", and in the sentence where it is located, the keyword "if" appears in front of the emotional word "good", then the emotional polarity of the emotional word "good" is changed. , that is, to change the emotional polarity of "good" from positive to negative.
[0074] In yet another embodiment of the present invention, step S106 may include: if the difference between the number of emotional words with positive emotional polarity and the number of emotional words with negative emotional polarity is greater than a set threshold, determining the sample to be labeled The sentiment polarity of is positive; if the difference between the number of sentiment words with negative sentiment polarity and the number of sentiment words with positive sentiment polarity is greater than the set threshold, the sentiment polarity of the sample to be marked is determined to be negative. Suppose the number of sentiment words with positive sentiment polarity is N + , the number of sentiment words with negative sentiment polarity is N - , set the threshold to N max , if N + -N -N max , then the sentiment polarity of the sample to be labeled is determined to be positive, if N - -N +N max , the sentiment polarity of the sample to be labeled is determined to be negative.
[0075] In yet another embodiment of the present invention, step S105 may include: constructing a maximum entropy classifier by using the labeled samples; using the maximum entropy classifier to label and classify other samples to be labeled in the sample set to be labeled to obtain a classification result, and determine each sample according to the classification result. The sentiment polarity of each sample to be labeled, and finally two standard sample sets are obtained: the positive labeling sample set and the negative labeling sample set.
[0076] Among them, the maximum entropy classifier, as one of the machine learning classification methods, is based on the maximum entropy information theory, and its basic idea is to build a model for all known factors and exclude all unknown factors. That is, to find a probability distribution that satisfies all known facts, but makes the unknown factors the most random. Compared with the Naive Bayes method, the biggest feature of this method is that it does not need to satisfy the conditional independence between features. Therefore, this method is suitable for fusing various different features without considering the influence between them.
[0077] Under the maximum entropy model, the formula for predicting the conditional probability P(c|D) is as follows:
[0078] P ( c i | D ) = 1 Z ( D ) exp ( Σ k λ k , c F k , c ( D , c i ) )
[0079] where Z(D) is the normalization factor. P k,c is the characteristic function, defined as:
[0080] F k , c ( D , c ′ ) = 1 n k ( d ) 0 and c ′ = c 0 oterwise
[0081] The present invention also provides a Chinese text sentiment classification method. In addition to the above steps S101-S107, the method further includes: classifying the Chinese text to be classified by using the constructed maximum entropy classifier.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Low-ripple power supply

InactiveUS20150035579A1Avoid costDecrease complexityComputations using contact-making devicesApparatus without intermediate ac conversionClock generatorEngineering
Owner:ALGOLTEK

Fluid sensor and pumpjack control system

ActiveUS20190234777A1Avoid costPump testingFlexible member pumpsCapacitive sensingCapacitance transducer
Owner:HYDROACOUSTICS INC

Thyroid disease prediction modeling method based on association decision tree

PendingCN111489827AAvoid costImprove robustness and generalizationMedical simulationCancer recurrenceTargeted interventions
Owner:JILIN UNIV

Classification and recommendation of technical efficacy words

  • Avoid cost
  • Reduced build time

Transaction management system and method

InactiveUS20070112671A1Avoid costFinancePayment architectureElectronic marketsTransaction management system
Owner:GUARANTEED MARKETAB

Audio book sentence-by-sentence synchronous display method

ActiveCN106847315AReduced word segmentation workloadAvoid costCarrier indexing/addressing/timing/synchronisingElectrical appliancesSentence segmentationTimestamp
Owner:广州朗锐数字传媒科技有限公司

Multi-light-source acceleration method for programmable shader

ActiveCN104463943AImprove efficiencyReduced build time3D-image renderingTree shapedShader
Owner:合肥月照数码科技有限公司

Establishment method of three-dimensional cell model

PendingCN105671029AEasy to operateReduced build timeCulture processElectrical/wave energy microorganism treatmentBiologyFerromagnetism
Owner:THE SIXTH AFFILIATED HOSPITAL OF SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products