Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Novel multi-head attention mechanism

A kind of attention and multi-head technology, applied in the direction of computing model, machine learning, computing, etc., can solve the problems of high space complexity, destroying sequence continuity structure, occupying large computing space, etc., to reduce storage space consumption and reduce model complexity Degree, the effect of improving the degree of parallelism

Inactive Publication Date: 2020-05-26
SHAN DONG MSUN HEALTH TECH GRP CO LTD
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the original multi-head attention mechanism has its inherent disadvantages: first, the space occupation of the multi-head attention mechanism is proportional to the square of the length of the processed sequence, and the space complexity is high, which will take up a lot of time when processing longer sequences. Computational space; secondly, the attention mechanism establishes the relationship between all elements in the sequence, but in the field of actual language processing, it is not necessary to model all the elements in all sequences, that is, the traditional multi-head attention matrix There is a lot of waste in the calculations in , slowing down the speed of sequence processing
[0004] The traditional way to solve the space complexity of the multi-head attention mechanism, such as Transformer-XL, divides the sequence into blocks, and the segmentation of the sequence destroys the original continuous structure of the sequence and the characteristics of the original data.
And in the connection structure part between blocks, the attention mechanism similar to Decoder in Transformer is used for processing, and the loop structure is added in the process of model processing, which reduces the degree of parallelization of the model and reduces the overall performance of the model.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Novel multi-head attention mechanism
  • Novel multi-head attention mechanism
  • Novel multi-head attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be further described below.

[0029] A new type of multi-head attention mechanism, including the following steps:

[0030] a) Connect the equal-dimensional vector sequences input to the multi-head attention mechanism to form a matrix E, E i,j Indicates the data in row i and column j in the matrix, 1≤i≤l, l is the sequence length in the matrix, 1≤j≤d, d is the dimension of the vector sequence;

[0031] b) Set the model hyperparameters k, h, m respectively, k, h, m are positive integers, k represents the length range of establishing context in the multi-head attention mechanism, h represents the number of heads in the multi-head attention mechanism, m is the vector dimension of the hidden layer processed by each head in the multi-head attention mechanism;

[0032] c) Initialize the set of parameter matrices separately and In each set, there are h parameter matrices with d rows and m columns, for The ith matrix in the set, 1≤i≤h, fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a novel multi-head attention mechanism. According to a method employing local attention, compared with a global attention method adopted by a traditional multi-head attention mechanism, the complexity of the model is reduced, the sizes of all matrixes in the operation process are only in direct proportion to the length of the sequence, and compared with a matrix in direct proportion to the square of the sequence in the traditional attention mechanism, the storage space consumption of the model is reduced to a large extent. In calculation process, compared with a solution in Transformer-XL, blocking processing is not carried out on the sequence, the sequence characteristics of the original sequence are reserved to a great extent, softmax is used for establishing global semantics, and compared with a cross-block connection mode used in Transformer-XL, the parallelism degree of the model is improved.

Description

technical field [0001] The invention relates to the technical fields of artificial intelligence, machine learning and data mining, in particular to a novel multi-head attention mechanism. Background technique [0002] With the continuous integration of artificial intelligence technology and machine learning technology in the field of natural language processing, more and more deep learning technologies have been applied in the field of natural language processing. Among them, GPT, BERT, RoBERTa, ALBERT, XL-Net and other methods based on Transformer based on multi-head attention mechanism have won praise from the industry, and are increasingly being applied in natural language processing and other fields. [0003] However, the original multi-head attention mechanism has its inherent disadvantages: first, the space occupation of the multi-head attention mechanism is proportional to the square of the length of the processed sequence, and the space complexity is high, which will...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 张福鑫吴军张伯政樊昭磊张述睿
Owner SHAN DONG MSUN HEALTH TECH GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products