Classification Of Sparsely Labeled Text Documents While Preserving Semantics

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A text document, sparse technique, applied in the field of training text classifier systems, which can solve the problem of large volume, manual labor, etc.

Pending Publication Date: 2020-12-15

IBM CORP

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

As a result, traditional text classification methods face enormous challenges, including requiring a lot of manual labor in classifying these text documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0018] Embodiments of the present invention relate to text classification, and more particularly to a method of training a neural network architecture embodied as a natural language text classifier system. According to some embodiments of the present invention, a neural network architecture can be trained on sparsely annotated datasets, wherein the neural network architecture considers the semantics of text and achieves improved performance. According to some embodiments, the neural network architecture is configured to process text-based data in which only a small portion of the text is annotated, such as when only a few documents in a class of documents are annotated. According to one embodiment of the invention, the neural network architecture is configured to preserve semantics and sequential-dependencies of semantics identified within the text, which can be used to classify the text (eg, identifying sentiment, identifying groups of documents, etc.).

[0019] According to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method and system of training a neural network and a computer program product, and the method includes the steps: receiving a text corpus containing a labeled portion and anunlabeled portion, extracting local n-gram features and a sequence of the local n-gram features from the text corpus, processing the text corpus, using convolutional layers, according to the local n-gram features to determine capsule parameters of capsules configured to preserve the sequence of the local n-gram features, performing a forward-oriented dynamic routing between the capsules using thecapsule parameters to extract global characteristics of the text corpus, and processing the text corpus according to the global characteristics using a long short-term memory layer to extract globalsequential text dependencies from the text corpus, wherein parameters of the neural network are updated according to the local n-gram features, the capsule parameters, global characteristics, and global sequential text dependencies.

Description

technical field [0001] The present invention relates generally to text classification, and more particularly to methods of training text classifier systems. Background technique [0002] Traditional text classification applies technology to understand documents, for example to comply with regulatory requirements, integrate internal operations, etc. These text classifications typically require a high percentage of the training data to be labeled in order to be effective. As a result, traditional text classification methods face enormous challenges, including requiring a lot of manual labor in classifying these text documents. Contents of the invention [0003] According to some embodiments of the invention, a method of training a neural network to classify sparsely annotated text documents while preserving semantics includes receiving a text corpus comprising annotated portions and unannotated portions beyond the annotated portions ; extract multiple local n-gram features...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/35G06N3/04G06N3/08

CPCG06F16/35G06N3/049G06N3/08G06N3/045G06F40/216G06F40/30G06F40/284G06N3/044G06F17/15G06N3/047

Inventor J.J.托马斯A.E.佩特罗夫王婉婷M.阿拉德

Owner IBM CORP

Classification Of Sparsely Labeled Text Documents While Preserving Semantics

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology