A public opinion tendency identification method for training sample category distribution imbalance

A recognition method and tendency technology, applied in text database clustering/classification, character and pattern recognition, unstructured text data retrieval, etc., can solve the deviation between the effect of tendency recognition and the actual tendency, and the unbalanced text of training data Issues such as timeliness of release, timeliness of public opinion, and no solutions have been proposed to achieve the effect of improving classification accuracy and better recognition

Active Publication Date: 2019-04-02
WUHAN UNIV
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] When using general-purpose machine learning algorithms to analyze public opinion tendencies, problems such as class imbalance of training data, timeliness of text release, and timeliness of public opinion often lead to a large deviation between the effect of tendency recognition and the actual tendency
Currently, no effective solution has been proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A public opinion tendency identification method for training sample category distribution imbalance
  • A public opinion tendency identification method for training sample category distribution imbalance
  • A public opinion tendency identification method for training sample category distribution imbalance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

[0034] please see figure 1 , the present invention provides a method for identifying public opinion tendencies aimed at unbalanced distribution of training sample categories, comprising the following steps:

[0035] Step 1: Use the method of manual collection to track and mark the current hot spots of public opinion, select high-frequency words related to the field of public opinion concerned as hot words of public opinion, create a public opinion high-frequency word thesaurus, and update it daily;

[0036] In this embodiment, the source of hot words in publ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a public opinion tendency identification method for training sample category distribution imbalance. The method comprises the steps of firstly collecting the vocabularies related to the concerned public opinion field as public opinion hot words to create a lexicon; crawling a comment data set from a public opinion information source and divided into a training set and a test set; then classifying the public opinion tendency of the training set manually, and for the problem of class imbalance, adopting a bootstrap learning method for supplementing processing; extractingfeatures of each type of training samples, training an algorithm model by adopting naive Bayes, a support vector machine, a decision tree and other algorithms, classifying test set data by using the trained model, and identifying public opinion tendency according to a classification result. The methods of bootstrap learning, feature vector construction and classification model training all adopt atime-sensitive weighting method for weighting, so that the public opinion tendency reflected by the methods is more timely. The public opinion tendency identification method solves the problem of inaccurate classification caused by imbalance of training data, and improves the accuracy of public opinion tendency identification and the timeliness of public opinion analysis.

Description

technical field [0001] The invention belongs to the technical field of natural language processing and machine learning, and relates to a method for analyzing the tendency of public opinion by using a machine learning algorithm, in particular to a method for identifying the tendency of public opinion aimed at the unbalanced distribution of training sample categories. Background technique [0002] The current Internet penetration rate is growing rapidly, the number of updated news on the Internet is very large, and the resulting impact on public opinion is also very large. It is under this situation that the analysis technology of public opinion tendency was born, aiming at analyzing the public opinion generated on the Internet. Reviewers' tendentious attitudes and attitude changes are screened in a timely manner, thereby helping regulatory authorities to detect changes in public opinion in a timely manner and build a civilized and harmonious public opinion environment. [00...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F16/35G06K9/62
CPCG06F40/289G06F18/24155
Inventor 彭蓉王卓洪涛
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products