An Active Learning Short Text Classification Method and System Based on Sampling Frequency Optimization

A sampling frequency, text classification technology, applied in neural learning methods, text database clustering/classification, unstructured text data retrieval, etc. , to achieve a wide range of applications, broaden the effect of optimization direction

Active Publication Date: 2021-04-06
上海乐言科技股份有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the non-standard expression of short text, labeling the required content will consume a lot of manpower, and the effect is relatively limited
[0003] At present, the industry uses active learning to deal with short text classification problems, but when the mainstream active learning methods are directly applied to this, we find that the performance in short text classification is not good, and with the increase of the number of categories in the data set, the existing The performance of the sampling method will also decrease accordingly

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An Active Learning Short Text Classification Method and System Based on Sampling Frequency Optimization
  • An Active Learning Short Text Classification Method and System Based on Sampling Frequency Optimization
  • An Active Learning Short Text Classification Method and System Based on Sampling Frequency Optimization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. Note that the aspects described below in conjunction with the drawings and specific embodiments are only exemplary, and should not be construed as limiting the protection scope of the present invention.

[0038] Before describing the technical solution of the present invention, the principle of the active learning short text classification of the present invention will be described first.

[0039] For the active learning framework, it is assumed that a given data set z={(x 1 ,y 1 ),...(x N ,y N )}, where x i is a D-dimensional feature vector, y i ∈{0,1,...,K}, the above-mentioned N, K, and D are certain constants. The data set z is divided into labeled data sets L t and the unlabeled dataset U t .

[0040] Active learning short text classification algorithms generally include the following steps:

[0041] a. A small fraction of labele...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an active learning short text classification method and system based on sampling frequency optimization, broadens the active learning optimization direction, and provides a simple and effective optimization framework widely used in the industry. The technical scheme is as follows: the text classifier learns the marked data; the unmarked data is sampled and evaluated based on the learning result of the text classifier and the most valuable data is selected; the selected data is manually marked and added to the marked data, Repeat the above steps until the number of iterations reaches the upper limit or the accuracy reaches the target. In the sampling evaluation process, the labeled data is classified according to its category, the amount of labeled data in each category is counted, and the respective sampling frequency data is obtained; for the unlabeled data, the unlabeled data is evaluated first to obtain the initial evaluation. score and its prediction result category, and then obtain the corresponding sampling frequency data according to the prediction result category, and obtain the final evaluation score based on the initial evaluation score and the sampling frequency data of the corresponding category.

Description

technical field [0001] The invention relates to a technique for using active learning to process short text classification, in particular to an active learning short text classification method and system based on sampling frequency optimization. Background technique [0002] With the prevalence of e-commerce and online communication, in many application fields, such as instant messaging, online chat logs, bulletin board system headlines, Internet news comments, Twitter, etc., short text content is flooding people's daily life. In the face of many needs such as topic recommendations and e-commerce chat robots derived from this, short text classification becomes very important. However, due to the non-standard expression of short text, labeling the required content will consume a lot of manpower, and the effect is relatively limited. [0003] At present, the industry uses active learning to deal with short text classification problems, but when the mainstream active learning ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F16/35G06K9/62G06N3/04G06N3/08
CPCG06F16/3344G06F16/35G06N3/049G06N3/08G06N3/045G06F18/241
Inventor 朱其立沈李斌廖千姿顾钰仪赵迎功吴海华
Owner 上海乐言科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products