Text classification method, text classifier and storage medium for unbalanced data set

A technology for text classification and balancing data, applied in the field of text information, can solve problems such as imbalance

Active Publication Date: 2018-10-09
WEBANK (CHINA)
View PDF9 Cites 32 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The main purpose of the present invention is to provide a text classification method, text classifier and storage medium for unbalanced data sets, aiming to solve the limitations of traditional classification methods in the face of unbalanced data sets, thereby improving the accuracy and efficiency of text classification technical issues

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method, text classifier and storage medium for unbalanced data set
  • Text classification method, text classifier and storage medium for unbalanced data set
  • Text classification method, text classifier and storage medium for unbalanced data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0052] refer to figure 1 , figure 1 It is a schematic structural diagram of the operating environment of the text classifier involved in the solution of the embodiment of the present invention.

[0053] Such as figure 1As shown, the text classifier can be a computer device such as a mobile phone, a notebook, a tablet computer, and a cloud server, and can include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein, the communication bus 1002 is used to realize connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the network interface 1004 may optionally include a standard wired interface or a wireless interface (such ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method, a text classifier and storage medium for an unbalanced data set, wherein the method comprises the following steps of: acquiring the data set usedfor training a classification model; determining whether each text data is a plurality of samples or a few samples according to the category information marked by the text data; calculating the ratiobetween the number of the plurality of samples and the number of the few samples to obtain an unbalance ratio; carrying out pre-processing on the text data to obtain a corresponding sample point to map into a vector space; updating the data set after the interpolation sample is obtained based on the preset interpolation strategy, the unbalance rate and each sample point; training the classification model using the updated data set as a training sample set; acquiring the text data to be tested, and introducing the text data to be tested into the classification model after finishing the trainingto classify so as to obtain the category of the text data to be tested as a classification result. According to the invention, the few samples and a boundary region thereof can be enlarged, and the classification effect of the model can be effectively improved.

Description

technical field [0001] The invention relates to the technical field of text information, in particular to a text classification method for an unbalanced data set, a text classifier and a storage medium. Background technique [0002] With the advancement of science and technology and the popularization and application of the Internet, in the face of massive text data, such as comments published or fed back by users, the automatic text classification technology has emerged as the times require. [0003] At present, machine learning methods are gradually being applied to text classification technology. Traditional machine learning methods are implemented based on data balance. However, in practical applications, text classification methods are less effective due to the uneven distribution of most data. . There are two ways to deal with the problem of data imbalance, one is the algorithm level, and the other is the sampling level, so as to achieve class balance and improve the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F18/24147G06F18/214
Inventor 刘志煌吴三平
Owner WEBANK (CHINA)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products