Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text classification method based on deep multi-task learning

A multi-task learning and text classification technology, applied in the field of natural language processing, can solve the problems of insufficient training data and the decline of test set generalization ability, and achieve the effect of solving insufficient training data and improving performance

Inactive Publication Date: 2017-05-31
SUN YAT SEN UNIV
View PDF4 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The neural network parameters are large in scale and the training data is small, so the problem is that it is easy to overfit, and the generalization ability on the test set is reduced.
There are many methods to improve the overfitting problem, such as parameter regularization, batch normalization, etc., but it does not essentially solve the problem of insufficient training data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method based on deep multi-task learning
  • Text classification method based on deep multi-task learning
  • Text classification method based on deep multi-task learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] like Figure 1-2 As shown, a text classification method based on deep multi-task learning includes the following steps:

[0039] S1: Use word vectors and bidirectional recurrent networks to learn the document representation of the current task;

[0040] S2: Extract features from document representations of other tasks using convolutional neural networks;

[0041] S3: Learn a classifier using the document representation of the current task and the features of other tasks.

[0042] The specific process of step S1 is:

[0043] Divide all Chinese documents in all tasks into word segmentation, assuming that there are N words in total, and then assign each word a unique label, and then represent it as a K-dimensional vector, that is, all word vectors travel an N*K matrix, and then use positive The state distribution is randomly initialized, and the word vector matrix is ​​shared by all tasks;

[0044] The document representation of the current task is learned with word ve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text classification method based on deep multi-task leaning. The method comprises the steps that by means of a recurrent neural network obtained through other task training and by combining the learning ability of a convolutional neural network, additional document representation is obtained, that is to say, a large amount of external information is introduced, semantic representation of a document is extended, and the problem that training data is insufficient is effectively solved. Accordingly, compared with a traditional multi-task leaning method, the convolutional neural network is used for conducting feature extraction on bottom-layer features of an auxiliary task, the features of other tasks can be utilized for being effectively transferred to the current task, and the performance of text classification is improved.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, and more particularly, to a text classification method based on deep multi-task learning. Background technique [0002] With the development of the Internet, there are more and more demands for tasks such as topic identification, spam identification, and sentiment analysis, which are all based on text classification. The goal of text classification is to give some documents and their corresponding class labels as a training set, and learn a classifier through an algorithm that can predict the class labels of documents without labels in the test set. [0003] There are many text classification algorithms based on deep neural networks, including recurrent neural networks, convolutional neural networks, recurrent convolutional neural networks, and the combination of these networks with attention mechanisms, memory modules, etc. These neural networks have achieved good re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06N3/08
CPCG06F16/35G06N3/08
Inventor 张梓滨潘嵘
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products