Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Active learning short text classification method and system based on sampling frequency optimization

A sampling frequency, active learning technology, applied in neural learning methods, text database clustering/classification, unstructured text data retrieval, etc. Direction, effect on a wide range of applications

Active Publication Date: 2020-11-06
上海乐言科技股份有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the non-standard expression of short text, labeling the required content will consume a lot of manpower, and the effect is relatively limited
[0003] At present, the industry uses active learning to deal with short text classification problems, but when the mainstream active learning methods are directly applied to this, we find that the performance in short text classification is not good, and with the increase of the number of categories in the data set, the existing The performance of the sampling method will also decrease accordingly

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Active learning short text classification method and system based on sampling frequency optimization
  • Active learning short text classification method and system based on sampling frequency optimization
  • Active learning short text classification method and system based on sampling frequency optimization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. Note that the aspects described below in conjunction with the drawings and specific embodiments are only exemplary, and should not be construed as limiting the protection scope of the present invention.

[0038] Before describing the technical solution of the present invention, the principle of the active learning short text classification of the present invention will be described first.

[0039] For the active learning framework, it is assumed that a given data set z={(x 1 ,y 1 ),...(x N ,y N )}, where x i is a D-dimensional feature vector, y i ∈{0,1,...,K}, the above-mentioned N, K, and D are certain constants. The data set z is divided into labeled data sets L t and the unlabeled dataset U t .

[0040] Active learning short text classification algorithms generally include the following steps:

[0041] a. A small fraction of labele...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an active learning short text classification method and system based on sampling frequency optimization, broadens the active learning optimization direction, and provides a simple and effective optimization framework widely used in the industry. According to the technical scheme, the method includes enabling a text classifier to learn labeled data; sampling and evaluating the unlabeled data based on the learning result of the text classifier and selecting the most valuable data; and manually labeling the selected data and adding the selected data into the labeled data,and repeating the steps until the number of iterations reaches the upper limit or the accuracy reaches the standard. In the sampling evaluation process, the labeled data is classified according to thecategory to which the labeled data belong, and the labeled data volume of each category is counted to obtain respective sampling frequency data; for the unlabeled data, the invention includes evaluating the unlabeled data to obtain an initial evaluation score and a prediction result category of the initial evaluation score, then obtaining corresponding sampling frequency data according to the prediction result category, and obtaining a final evaluation score based on the initial evaluation score and the sampling frequency data of the corresponding category.

Description

technical field [0001] The invention relates to a technique for using active learning to process short text classification, in particular to an active learning short text classification method and system based on sampling frequency optimization. Background technique [0002] With the prevalence of e-commerce and online communication, in many application fields, such as instant messaging, online chat logs, bulletin board system headlines, Internet news comments, Twitter, etc., short text content is flooding people's daily life. In the face of many needs such as topic recommendations and e-commerce chat robots derived from this, short text classification becomes very important. However, due to the non-standard expression of short text, labeling the required content will consume a lot of manpower, and the effect is relatively limited. [0003] At present, the industry uses active learning to deal with short text classification problems, but when the mainstream active learning ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/35G06K9/62G06N3/04G06N3/08
CPCG06F16/3344G06F16/35G06N3/049G06N3/08G06N3/045G06F18/241
Inventor 朱其立沈李斌廖千姿顾钰仪赵迎功吴海华
Owner 上海乐言科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products