Unlock instant, AI-driven research and patent intelligence for your innovation.

Classification Of Sparsely Labeled Text Documents While Preserving Semantics

A text document, sparse technique, applied in the field of training text classifier systems, which can solve the problem of large volume, manual labor, etc.

Pending Publication Date: 2020-12-15
IBM CORP
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As a result, traditional text classification methods face enormous challenges, including requiring a lot of manual labor in classifying these text documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification Of Sparsely Labeled Text Documents While Preserving Semantics
  • Classification Of Sparsely Labeled Text Documents While Preserving Semantics
  • Classification Of Sparsely Labeled Text Documents While Preserving Semantics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] Embodiments of the present invention relate to text classification, and more particularly to a method of training a neural network architecture embodied as a natural language text classifier system. According to some embodiments of the present invention, a neural network architecture can be trained on sparsely annotated datasets, wherein the neural network architecture considers the semantics of text and achieves improved performance. According to some embodiments, the neural network architecture is configured to process text-based data in which only a small portion of the text is annotated, such as when only a few documents in a class of documents are annotated. According to one embodiment of the invention, the neural network architecture is configured to preserve semantics and sequential-dependencies of semantics identified within the text, which can be used to classify the text (eg, identifying sentiment, identifying groups of documents, etc.).

[0019] According to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and system of training a neural network and a computer program product, and the method includes the steps: receiving a text corpus containing a labeled portion and anunlabeled portion, extracting local n-gram features and a sequence of the local n-gram features from the text corpus, processing the text corpus, using convolutional layers, according to the local n-gram features to determine capsule parameters of capsules configured to preserve the sequence of the local n-gram features, performing a forward-oriented dynamic routing between the capsules using thecapsule parameters to extract global characteristics of the text corpus, and processing the text corpus according to the global characteristics using a long short-term memory layer to extract globalsequential text dependencies from the text corpus, wherein parameters of the neural network are updated according to the local n-gram features, the capsule parameters, global characteristics, and global sequential text dependencies.

Description

technical field [0001] The present invention relates generally to text classification, and more particularly to methods of training text classifier systems. Background technique [0002] Traditional text classification applies technology to understand documents, for example to comply with regulatory requirements, integrate internal operations, etc. These text classifications typically require a high percentage of the training data to be labeled in order to be effective. As a result, traditional text classification methods face enormous challenges, including requiring a lot of manual labor in classifying these text documents. Contents of the invention [0003] According to some embodiments of the invention, a method of training a neural network to classify sparsely annotated text documents while preserving semantics includes receiving a text corpus comprising annotated portions and unannotated portions beyond the annotated portions ; extract multiple local n-gram features...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06N3/04G06N3/08
CPCG06F16/35G06N3/049G06N3/08G06N3/045G06F40/216G06F40/30G06F40/284G06N3/044G06F17/15G06N3/047
Inventor J.J.托马斯A.E.佩特罗夫王婉婷M.阿拉德
Owner IBM CORP