Parallel decoding using auto-regression machine learning model

An autoregressive model and model technology, applied in machine learning, computing models, integrated learning, etc., can solve problems such as quality deterioration and achieve the effect of improving the generation speed

Pending Publication Date: 2019-09-17
GOOGLE LLC
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

While some techniques achieve multi-order-of-magnitude speedups in speech synthesis, to the best of our knowledge the largest published wall-clock improvement for non-batched decoding in machine translation is about 4x at a significant deterioration in quality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel decoding using auto-regression machine learning model
  • Parallel decoding using auto-regression machine learning model
  • Parallel decoding using auto-regression machine learning model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] In sequence-to-sequence problems, given an input sequence (x 1 ,...,x n ), and the goal is to predict the corresponding output sequence (y 1 ,...,y m ). In the case of machine translation, these sequences may be source and target sentences, or in the case of image super-resolution, these sequences may be low-resolution images and high-resolution images.

[0023] Suppose the system has learned an autoregressive scoring model p(y|x) decomposed according to left-to-right decomposition

[0024]

[0025] Given an input x, the system can use the model to predict the output by greedy decoding as described below. Starting from j=0, the system starts with the highest scoring token Iteratively expand the prediction and set j←j+1 until the termination condition is met. For language generation problems, the system usually stops once the end of a particular sequence of tokens has been generated. For the image generation problem, the system simply decodes for a fixed num...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to parallel decoding using an auto-regression machine learning model, and provides a method, system, and device for generating output from auto-regression to a sequence model in a parallel mode, including computer program encoded in a computer storing medium. In one aspect, a block parallel decoding method uses a fact that some frameworks can score a sequence in sub-linear time. Prediction is generated for multiple time steps at a time, then the longest prefix verified by the scoring model is returned, and the method can improve the speed of greedy decoding on the premise that the performance is not influenced.

Description

[0001] Cross References to Related Applications [0002] This application claims the benefit of priority from the filing date of U.S. Provisional Application No. 62 / 673,796, filed May 18, 2018, entitled Parallel Decoding Using Autoregressive Machine Learning Models, under 35 U.S.C. § 119(e), and by References are incorporated herein. Background technique [0003] Decoding and generating output from an autoregressive model is sequential in nature because the model must be provided with its own prior predictions. This makes large autoregressive models potentially difficult to implement in production, especially in low-latency environments. [0004] Three currently relevant approaches to overcoming this difficulty may be mentioned. Each of them has the problem that the faster they go, the worse the quality becomes. [0005] The first approach is to predict fertility and noise parallel decoding. The approach is described in Non-Autoregressive Neural Machine Translation by Gu e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06N3/04G06N20/00G06V10/764
CPCG06N20/00G06F40/47G06N3/045G06N20/20G06N3/08G06V10/82G06V10/764G06F18/2413G06N7/00G06F18/2185G06F18/22
Inventor 诺姆·M·沙泽尔雅各布·D·乌斯克雷特米切尔·托马斯·斯特恩
Owner GOOGLE LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products