Bullet screen text classification method, device, equipment, and storage medium

A text classification and text technology, applied in the field of big data, can solve problems such as poor classification model effect and unbalanced distribution of training samples, and achieve the effect of reducing the risk of over-fitting, reducing bias, and high recognition accuracy

Pending Publication Date: 2019-11-01
WUHAN DOUYU NETWORK TECH CO LTD
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention proposes a barrage text classification method, device, equipment and storage medium, which are used to solve the problem that the classification model is not effective due to unbalanced distribution of training samples when classifying live barrage texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bullet screen text classification method, device, equipment, and storage medium
  • Bullet screen text classification method, device, equipment, and storage medium
  • Bullet screen text classification method, device, equipment, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The present invention proposes a bullet-screen text classification method, device, equipment and storage medium. By using the model combination method, the existing classification models are assembled in a certain way to form a classifier with more powerful performance and improve the classification accuracy.

[0047]In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the following The described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0048] The problem of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a bullet screen text classification method, a bullet screen text classification device, equipment and a storage medium. The method comprises the steps: obtaining an imbalance training data set with a pre-marked category, and dividing the training data set into a sufficient sample and an insufficient sample; training the sufficient samples by adopting a textCNN model; carrying out model training on the insufficient samples by adopting an SVM classifier; inputting a text to be tested into the trained textCNN model, and outputting classification probabilities of various categories in sufficient samples; and if the output classification probability is smaller than a first preset threshold, inputting the to-be-tested text into a trained SVM classifier, and outputting a predicted category. According to the method, the classification models for different text scales are obtained through separate training according to the sizes of the training samples, then the two classification models are combined to be used for classifying the to-be-detected text, the problem of data imbalance of the training samples is solved, compared with single model training, the risk of over-fitting can be reduced, bias is reduced, and the recognition accuracy is higher.

Description

technical field [0001] The invention belongs to the technical field of big data, and in particular relates to a barrage text classification method, device, equipment and storage medium. Background technique [0002] On the live broadcast platform, gangsters will send a large number of advertising bullet screens or pornographic bullet screens in the live broadcast room to achieve the purpose of destroying the live broadcast environment of the platform and illegally seeking improper benefits, which not only greatly destroys the user experience, but also directly and indirectly Broke the interests of the platform and normal users. At present, the interception of spam barrage can be achieved by training text models, but when training spam barrage text classification models, typical "28 distribution" data is often encountered, and the distribution of training samples is uneven, resulting in ineffective classification models. For example, the model always divides the samples to b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35H04N21/435G06K9/62
CPCG06F16/35H04N21/435G06F18/2411G06F18/24147G06F18/241
Inventor 王姣
Owner WUHAN DOUYU NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products