Novel multi-head attention mechanism

A kind of attention and multi-head technology, applied in the direction of computing model, machine learning, computing, etc., can solve the problems of high space complexity, destroying sequence continuity structure, occupying large computing space, etc., to reduce storage space consumption and reduce model complexity Degree, the effect of improving the degree of parallelism
CN111199288AInactive Publication Date: 2020-05-26SHAN DONG MSUN HEALTH TECH GRP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
SHAN DONG MSUN HEALTH TECH GRP CO LTD
Publication Date
2020-05-26
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a novel multi-head attention mechanism. According to a method employing local attention, compared with a global attention method adopted by a traditional multi-head attention mechanism, the complexity of the model is reduced, the sizes of all matrixes in the operation process are only in direct proportion to the length of the sequence, and compared with a matrix in direct proportion to the square of the sequence in the traditional attention mechanism, the storage space consumption of the model is reduced to a large extent. In calculation process, compared with a solution in Transformer-XL, blocking processing is not carried out on the sequence, the sequence characteristics of the original sequence are reserved to a great extent, softmax is used for establishing global semantics, and compared with a cross-block connection mode used in Transformer-XL, the parallelism degree of the model is improved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical fields of artificial intelligence, machine learning and data mining, in particular to a novel multi-head attention mechanism. Background technique

[0002] With the continuous integration of artificial intelligence technology and machine learning technology in the field of natural language processing, more and more deep learning technologies have been applied in the field of natural language processing. Among them, GPT, BERT, RoBERTa, ALBERT, XL-Net and other methods based on Transformer based on multi-head attention mechanism have won praise from the industry, and are increasingly being applied in natural language processing and other fields.

[0003] However, the original multi-head attention mechanism has its inherent disadvantages: first, the space occupation of the multi-head attention mechanism is proportional to the square of the length of the processed sequence, and the space complexity is high, which will...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More