Cross-language unsupervised classification with multi-view transfer learning

A multi-view, language technology, applied in semantic analysis, natural language translation, neural learning methods, etc., can solve problems such as harming the generalization performance of fine-tuning models and failing to capture language semantic similarity well

Pending Publication Date: 2021-12-17
BAIDU USA LLC
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, under the "zero parallel resources" setting, encoders trained from self-supervised masked language modeling within each langu...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-language unsupervised classification with multi-view transfer learning
  • Cross-language unsupervised classification with multi-view transfer learning
  • Cross-language unsupervised classification with multi-view transfer learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In the following description, for purposes of explanation, specific details are set forth in order to provide an understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure may be practiced without these details. Furthermore, those skilled in the art will appreciate that the embodiments of the present disclosure described below can be implemented in various ways, such as a process, an apparatus, a system, a device, or a method on a tangible computer readable medium.

[0054] Components or modules shown in the figures are illustrations of example embodiments of the disclosure and are intended to avoid obscuring the disclosure. It should also be understood that throughout this discussion, components may be described as separate functional units, which may include subunits, but those skilled in the art will recognize that various components, or portions thereof, may be divided into separate components or Can...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides cross-language unsupervised classification with multi-view transfer learning. Embodiments of an unsupervised cross-language sentiment classification model, which may be referred to as a multi-view encoder classifier (MVEC), utilizing an unsupervised machine translation (UMT) system and a language discriminator are presented. Unlike previous Language Model (LM)-based fine tuning methods that adjust parameters based only on classification errors of training data, embodiments employ an encoder-decoder framework of UMT as a regularization component on shared network parameters. In one or more embodiments, a cross-language encoder of an embodiment learns a shared representation, which is effective for reconstructing input sentences of both languages and generating more representative views from the input for classification. Experiments on five language pairs prove that the embodiment of the MVEC is obviously superior to other models for 8/11 sentiment classification tasks.

Description

technical field [0001] The present disclosure generally relates to systems and methods for computer learning that can provide improved computer performance, features and usage. More specifically, the present disclosure relates to systems and methods for classification. Background technique [0002] Recent neural network models have achieved remarkable performance in sentiment classification in English and other languages. However, their success depends heavily on the availability of large amounts of labeled data or parallel corpora. In fact, some low-resource languages ​​or applications have limited labeled data, or even no labeled or parallel corpora, which may prevent training a powerful and accurate classifier. [0003] To build classification models (such as sentiment classification models) for low-resource languages, researchers have recently developed cross-lingual text classification (CLTC) models (see XuochenXu and Yiming Yang, “Cross-lingual distillation for text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F40/126G06F40/151G06F40/58G06N3/08
CPCG06F16/35G06F40/58G06F40/151G06F40/126G06N3/08G06F40/30G06F40/216G06F40/284G06N3/088G06N3/044G06N3/045G06F40/197G06N3/02G06F16/353
Inventor 费洪亮李平
Owner BAIDU USA LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products