Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

57 results about "Multiplier accumulator" patented technology

In computing, especially digital signal processing, the multiply–accumulate operation is a common step that computes the product of two numbers and adds that product to an accumulator.The hardware unit that performs the operation is known as a multiplier–accumulator (MAC, or MAC unit); the operation itself is also often called a MAC or a MAC operation.

High speed and efficient matrix multiplication hardware module

A matrix multiplication module and matrix multiplication method are provided that use a variable number of multiplier-accumulator units based on the amount of data elements of the matrices are available or needed for processing at a particular point or stage in the computation process. As more data elements become available or are needed, more multiplier-accumulator units are used to perform the necessary multiplication and addition operations. To multiply an N×M matrix by an M×N matrix, the total (maximum) number of used MAC units is “2*N−1”. The number of MAC units used starts with one (1) and increases by two at each computation stage, that is, at the beginning of reading of data elements for each new row of the first matrix. The sequence of the number of MAC units is {1, 3, 5, . . . , 2*N−1} for computation stages each of which corresponds to reading of data elements for each new row of the left hand matrix, also called the first matrix. For the multiplication of two 8×8 matrices, the performance is 16 floating point operations per clock cycle. For an FPGA running at 100 MHz, the performance is 1.6 Giga floating point operations per second. The performance increases with the increase of the clock frequency and the use of larger matrices when FPGA resources permit. Very large matrices are partitioned into smaller blocks to fit in the FPGA resources. Results from the multiplication of sub-matrices are combined to form the final result of the large matrices.
Owner:HARRIS CORP

Convolutional neural network acceleration method and apparatus

The invention relates to a convolutional neural network acceleration method and apparatus. The method comprises the steps of reading an input feature graph of a convolutional layer; inputting the input feature graph to a processing array group of the convolutional layer, and according to a first reference pixel vector, a second reference pixel vector, a convolution kernel weight and finished inputchannel data, performing convolution multiplication and addition operation by utilizing a propagate partial multiplier-accumulator to obtain an output result of the processing array group; accordingto the output result of the processing array group, obtaining an output feature graph of the convolutional layer; writing the output feature graph of the final layer of the convolutional layer in an input cache of a full connection layer; executing the multiplication and addition operation by the full connection layer according to the output feature graph of the final layer of the convolutional layer to obtain an output eigenvector of the full connection layer; and outputting the output eigenvector of the final layer of the full connection layer to a fourth block of a first memory. The hardware resources and the power consumption are effectively reduced; and the processing speed of a convolutional neural network is increased.
Owner:TSINGHUA UNIV

High Speed and Efficient Matrix Multiplication Hardware Module

A matrix multiplication module and matrix multiplication method are provided that use a variable number of multiplier-accumulator units based on the amount of data elements of the matrices are available or needed for processing at a particular point or stage in the computation process. As more data elements become available or are needed, more multiplier-accumulator units are used to perform the necessary multiplication and addition operations. To multiply an N×M matrix by an M×N matrix, the total (maximum) number of used MAC units is “2*N−1”. The number of MAC units used starts with one (1) and increases by two at each computation stage, that is, at the beginning of reading of data elements for each new row of the first matrix. The sequence of the number of MAC units is {1, 3, 5, . . . , 2*N−1} for computation stages each of which corresponds to reading of data elements for each new row of the left hand matrix, also called the first matrix. For the multiplication of two 8×8 matrices, the performance is 16 floating point operations per clock cycle. For an FPGA running at 100 MHz, the performance is 1.6 Giga floating point operations per second. The performance increases with the increase of the clock frequency and the use of larger matrices when FPGA resources permit. Very large matrices are partitioned into smaller blocks to fit in the FPGA resources. Results from the multiplication of sub-matrices are combined to form the final result of the large matrices.
Owner:HARRIS CORP

Programmable logic integrated circuit devices including dedicated processor components and hard-wired functional units

A programmable logic integrated circuit device (“PLD”) includes programmable logic and a dedicated (i.e., at least partly hard-wired) processor object (or at least a high-functionality functional unit) for performing or at least helping to perform tasks that it is unduly inefficient to implement in the more general-purpose programmable logic and/or that, if implemented in the programmable logic, would operate unacceptably or at least undesirably slowly. The processor object includes an operating portion and a program sequencer that retrieves or at least helps to retrieve instructions for controlling or at least partly controlling the operating portion. The processor object may also include an address generator and/or a multi-ported register file for generating or at least helping to generate addresses of data on which the operating portion is to operate and/or destinations of data output by the operating portions. Examples of typical operating portions include multiplier-accumulators, arithmetic logic units, barrel shifters, and DSP circuitry of these or other kinds. The PLD may be provided with the capability to allow programs to be written for the device using local or “relative” addresses, and to automatically convert these addresses to actual or “absolute” addresses when the programs are actually performed by the device.
Owner:ALTERA CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products