Text data representation learning using random document embedding

A machine learning model and data technology, applied in machine learning, neural learning methods, electrical digital data processing, etc., can solve problems such as feature embedding, expensive WMD calculations, etc.

Active Publication Date: 2020-04-24
IBM CORP
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, WMD is computationally expensive and difficult to use for feature embedding beyond simple K-Nearest Neighbors (KNN) machine learning methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text data representation learning using random document embedding
  • Text data representation learning using random document embedding
  • Text data representation learning using random document embedding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Various embodiments of the invention are described herein with reference to the associated drawings. Alternative embodiments of the invention may be devised without departing from the scope of the invention. In the following description and drawings, various connections and positional relationships (eg, above, below, adjacent, etc.) are set forth between elements. Unless stated otherwise, such connections and / or positional relationships may be direct or indirect, and the invention is not intended to be limited in this respect. Accordingly, a coupling of entities may refer to a direct or indirect coupling, and a positional relationship between entities may be a direct or indirect positional relationship. Furthermore, the various tasks and process steps described herein may be incorporated into a more comprehensive procedure or process having additional steps or functionality not described in detail herein.

[0028] The following definitions and abbreviations are used t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the present invention provide a computer-implemented: method for performing unsupervised feature representation learning for text data. The method generates reference text data having aset of random text sequences:, in which each text sequence of set of random text sequences is of a random Iength and comprises a number of random words, and in which each random Iength is sampled from a minimum; length to a maximum length. The random words of each text sequence i in the set are drawn from a distribution. The method generates a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data. The method provides the feature matrix as an input to one or more machine earning models,

Description

technical field [0001] The present invention relates generally to machine learning systems, and more particularly, to performing machine learning processes by using random document embeddings of text data. Background technique [0002] The phrase "machine learning" broadly describes the functionality of electronic systems that learn from data. A machine learning system, engine, or module may include a trainable machine learning algorithm, which may be trained, such as in an external cloud environment, to learn functional relationships between inputs and outputs, where the functional relationships are not currently known. [0003] The phrase "text data" broadly describes the data structure of an electronic system that includes one or more text sequences, where each text sequence holds a grouping of one or more words. Examples of text sequences include sentences, paragraphs, documents, etc. Examples of text data include sentences, paragraphs, documents, etc. The phrase "seq...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/284G06N20/00
CPCG06N20/10G06N3/088G06N7/01G06N3/008G06N20/00G06F16/3331
Inventor 吴凌飞M·J·维特布鲁克
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products