Novel multi-head attention mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A kind of attention and multi-head technology, applied in the direction of computing model, machine learning, computing, etc., can solve the problems of high space complexity, destroying sequence continuity structure, occupying large computing space, etc., to reduce storage space consumption and reduce model complexity Degree, the effect of improving the degree of parallelism

Inactive Publication Date: 2020-05-26

SHAN DONG MSUN HEALTH TECH GRP CO LTD

View PDF0 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] However, the original multi-head attention mechanism has its inherent disadvantages: first, the space occupation of the multi-head attention mechanism is proportional to the square of the length of the processed sequence, and the space complexity is high, which will take up a lot of time when processing longer sequences. Computational space; secondly, the attention mechanism establishes the relationship between all elements in the sequence, but in the field of actual language processing, it is not necessary to model all the elements in all sequences, that is, the traditional multi-head attention matrix There is a lot of waste in the calculations in , slowing down the speed of sequence processing

[0004] The traditional way to solve the space complexity of the multi-head attention mechanism, such as Transformer-XL, divides the sequence into blocks, and the segmentation of the sequence destroys the original continuous structure of the sequence and the characteristics of the original data.

And in the connection structure part between blocks, the attention mechanism similar to Decoder in Transformer is used for processing, and the loop structure is added in the process of model processing, which reduces the degree of parallelization of the model and reduces the overall performance of the model.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0028] The present invention will be further described below.

[0029] A new type of multi-head attention mechanism, including the following steps:

[0030] a) Connect the equal-dimensional vector sequences input to the multi-head attention mechanism to form a matrix E, E i,j Indicates the data in row i and column j in the matrix, 1≤i≤l, l is the sequence length in the matrix, 1≤j≤d, d is the dimension of the vector sequence;

[0031] b) Set the model hyperparameters k, h, m respectively, k, h, m are positive integers, k represents the length range of establishing context in the multi-head attention mechanism, h represents the number of heads in the multi-head attention mechanism, m is the vector dimension of the hidden layer processed by each head in the multi-head attention mechanism;

[0032] c) Initialize the set of parameter matrices separately and In each set, there are h parameter matrices with d rows and m columns, for The ith matrix in the set, 1≤i≤h, fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a novel multi-head attention mechanism. According to a method employing local attention, compared with a global attention method adopted by a traditional multi-head attention mechanism, the complexity of the model is reduced, the sizes of all matrixes in the operation process are only in direct proportion to the length of the sequence, and compared with a matrix in direct proportion to the square of the sequence in the traditional attention mechanism, the storage space consumption of the model is reduced to a large extent. In calculation process, compared with a solution in Transformer-XL, blocking processing is not carried out on the sequence, the sequence characteristics of the original sequence are reserved to a great extent, softmax is used for establishing global semantics, and compared with a cross-block connection mode used in Transformer-XL, the parallelism degree of the model is improved.

Description

technical field [0001] The invention relates to the technical fields of artificial intelligence, machine learning and data mining, in particular to a novel multi-head attention mechanism. Background technique [0002] With the continuous integration of artificial intelligence technology and machine learning technology in the field of natural language processing, more and more deep learning technologies have been applied in the field of natural language processing. Among them, GPT, BERT, RoBERTa, ALBERT, XL-Net and other methods based on Transformer based on multi-head attention mechanism have won praise from the industry, and are increasingly being applied in natural language processing and other fields. [0003] However, the original multi-head attention mechanism has its inherent disadvantages: first, the space occupation of the multi-head attention mechanism is proportional to the square of the length of the processed sequence, and the space complexity is high, which will...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N20/00

CPCG06N20/00

Inventor 张福鑫吴军张伯政樊昭磊张述睿

Owner SHAN DONG MSUN HEALTH TECH GRP CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Novel multi-head attention mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology