Patents

Literature

PatSnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

295 results about "Vector operations" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Vector operations, Extension of the laws of elementary algebra to vector s. They include addition, subtraction, and three types of multiplication. The sum of two vectors is a third vector, represented as the diagonal of the parallelogram constructed with the two original vectors as sides.

SIMD datapath coupled to scalar/vector/address/conditional data register file with selective subpath scalar processing mode

InactiveUS6839828B2Not compromise SIMD data processing performanceReduce consumptionRegister arrangementsDigital data processing detailsProcessor registerOperation mode

There is provided a processor designed to operate in a plurality of modes for processing vector and scalar instructions. Register files are each for storing scalar and vector data and address information. A parallel vector unit, coupled to the register files, includes functional units configurable to operate in a vector operation mode and a scalar operation mode. The vector unit includes an apparatus for tightly coupling the functional units to perform an operation specified by a current instruction. Under a vector operation mode, the vector unit performs, in parallel, a single vector operation on a plurality of data elements. The operations performed on the plurality of data elements are each performed by a different functional unit of the vector unit. Under a scalar operation mode, the vector unit performs a scalar operation on a data element received from the register files in a functional unit within the vector unit.

SIMD datapath coupled to scalar/vector/address/conditional data register file with selective subpath scalar processing mode

SIMD datapath coupled to scalar/vector/address/conditional data register file with selective subpath scalar processing mode

SIMD datapath coupled to scalar/vector/address/conditional data register file with selective subpath scalar processing mode

Owner:INTEL CORP

Quanton representation for emulating quantum-like computation on classical processors

ActiveUS20160328253A1Quantum computersMathematical modelsComputational scienceTheoretical computer science

The Quanton virtual machine approximates solutions to NP-Hard problems in factorial spaces in polynomial time. The data representation and methods emulate quantum computing on classical hardware but also implement quantum computing if run on quantum hardware. The Quanton uses permutations indexed by Lehmer codes and permutation-operators to represent quantum gates and operations. A generating function embeds the indexes into a geometric object for efficient compressed representation. A nonlinear directional probability distribution is embedded to the manifold and at the tangent space to each index point is also a linear probability distribution. Simple vector operations on the distributions correspond to quantum gate operations. The Quanton provides features of quantum computing: superpositioning, quantization and entanglement surrogates. Populations of Quantons are evolved as local evolving gate operations solving problems or as solution candidates in an Estimation of Distribution algorithm. The Quanton representation and methods are fully parallel on any hardware.

Quanton representation for emulating quantum-like computation on classical processors

Quanton representation for emulating quantum-like computation on classical processors

Quanton representation for emulating quantum-like computation on classical processors

Owner:KYNDI

System and method for performing compound vector operations

InactiveUS6192384B1Reduce bandwidth requirementsMinimize the numberOperational speed enhancementRegister arrangementsOperating instructionImaging processing

A processor particularly useful in multimedia applications such as image processing is based on a stream programming model and has a tiered storage architecture to minimize global bandwidth requirements. The processor has a stream register file through which the processor's functional units transfer streams to execute processor operations. Load and store instructions transfer streams between the stream register file and a stream memory; send and receive instructions transfer streams between stream register files of different processors; and operate instructions pass streams between the stream register file and computational kernels. Each of the computational kernels is capable of performing compound vector operations. A compound vector operation performs a sequence of arithmetic operations on data read from the stream register file, i.e., a global storage resource, and generates a result that is written back to the stream register file. Each function or compound vector operation is specified by an instruction sequence that specifies the arithmetic operations and data movements that are performed each cycle to carry out the compound operation. This sequence can, for example, be specified using microcode.

System and method for performing compound vector operations

System and method for performing compound vector operations

System and method for performing compound vector operations

Owner:THE BOARD OF TRUSTEES OF THE LELAND +1

Scalar hardware for performing SIMD operations

InactiveUS6292886B1Digital computer detailsConcurrent instruction executionProcessor registerExecution unit

A system for processing SIMD operands in a packed data format includes a scalar FMAC and a vector FMAC coupled to a register file through an operand delivery module. For vector operations, the operand delivery module bit steers a SIMD operand of the packed operand into an unpacked operand for processing by the first execution unit. Another SIMD operand is processed by the vector execution unit.

Scalar hardware for performing SIMD operations

Scalar hardware for performing SIMD operations

Scalar hardware for performing SIMD operations

Owner:INTEL CORP

Multiplier-based processor-in-memory architectures for image and graphics processing

InactiveUS7167890B2Efficiently reconfiguredNegligible amountComputation using non-contact making devicesImage memory managementGraphicsComputational science

A Procesor-In-Memory (PIM) includes a digital accelerator for image and graphics processing. The digital accelerator is based on an ALU having multipliers for processing combinations of bits smaller than those in the input data (e.g., 4×4 adders if the input data are 8-bit numbers). The ALU implements various arithmetic algorithms for addition, multiplication, and other operations. A secondary processing logic includes adders in series and parallel to permit vector operations as well as operations on longer scalars. A self-repairing ALU is also disclosed.

Multiplier-based processor-in-memory architectures for image and graphics processing

Multiplier-based processor-in-memory architectures for image and graphics processing

Multiplier-based processor-in-memory architectures for image and graphics processing

Owner:UNIVERSITY OF ROCHESTER +2

Vectorization of dynamic-time-warping computation using data reshaping

InactiveUS20090150313A1Removing data dependencyGenetic modelsDigital computer detailsDistance matrixAlgorithm

A method for comparing data sequences includes accepting first and second data sequences of data elements. A distance matrix is computed. The matrix includes rows and columns of matrix elements, describing distances between the data elements of the first sequence and the data elements of the second data sequence. The distance matrix is reshaped by applying successive, incremental shifts to the rows or columns so as to produce a reshaped matrix. A best-score path through the reshaped matrix is calculated using vector operations, so as to quantify a similarity between the first and second data sequences. Due to vectorization, a significant increase in computation speed is achieved in both software and hardware implementations.

Vectorization of dynamic-time-warping computation using data reshaping

Vectorization of dynamic-time-warping computation using data reshaping

Vectorization of dynamic-time-warping computation using data reshaping

Owner:IBM CORP

Optimized Scalar Promotion with Load and Splat SIMD Instructions

InactiveUS20090307656A1General purpose stored program computerSpecific program execution arrangementsSIMDVector operations

Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

Optimized Scalar Promotion with Load and Splat SIMD Instructions

Optimized Scalar Promotion with Load and Splat SIMD Instructions

Optimized Scalar Promotion with Load and Splat SIMD Instructions

Owner:IBM CORP

Unbalanced voltage compensation method, unbalanced voltage compensator, three-phase converter control method, and controller of three-phase converter

ActiveUS20110134669A1Ac-dc conversion without reversalConversion with intermediate conversion to dcControl signalThree phase converter

In compensating for unbalanced voltages of three-phase AC, instantaneous values of wye-phase voltages 120° out of phase with each other are obtained from line voltages using a centroid vector operation, symmetrical component voltages of three-phase balanced system are obtained from the instantaneous values of wye-phase voltages, a compensation signal to compensate unbalanced voltages of three-phase AC is generated from zero-phase-sequence voltage of symmetrical component voltages is generated, wye-phase voltages 120° out of phase, the unbalanced voltages of which are compensated, are obtained from the compensation signal and the symmetrical component voltages, a control signal of a PWM conversion is generated based on the compensated wye-phase voltage compensated, and the unbalanced voltages of three-phase AC are compensated. The amount of time to compensate the three-phase unbalanced voltages required for detecting an unbalance of voltages and generating a control signal can be shortened.

Unbalanced voltage compensation method, unbalanced voltage compensator, three-phase converter control method, and controller of three-phase converter

Unbalanced voltage compensation method, unbalanced voltage compensator, three-phase converter control method, and controller of three-phase converter

Unbalanced voltage compensation method, unbalanced voltage compensator, three-phase converter control method, and controller of three-phase converter

Owner:KYOSAN ELECTRIC MFG CO LTD

Compilation for a SIMD RISC processor

InactiveUS20070124722A1Software engineeringGeneral purpose stored program computerData processing systemScalar Value

A computer implemented method, data processing system, and computer usable code are provided for generating code to perform scalar computations on a Single-Instruction Multiple-Data (SIMD) Reduced Instruction Set Computer (RISC) architecture. The illustrative embodiments generate code directed at loading at least one scalar value and generate code using at least one vector operation to generate a scalar result, wherein all scalar computation for integer and floating point data is performed in a SIMD vector execution unit.

Compilation for a SIMD RISC processor

Compilation for a SIMD RISC processor

Compilation for a SIMD RISC processor

Owner:IBM CORP

System and method of processing data using scalar/vector instructions

ActiveUS20080046683A1Improve performanceReduce power consumptionConditional code generationGeneral purpose stored program computerProcessor registerCondition Code

A processor device is disclosed that includes a register file with a combined condition code register for scalar and vector operations. The processor device utilizes the combined condition code register for scalar and vector operations. Further, a compare operation can store resulting bits in the combined condition code register and a conditional operation can utilize the combined condition code register bits for evaluating a condition.

System and method of processing data using scalar/vector instructions

System and method of processing data using scalar/vector instructions

System and method of processing data using scalar/vector instructions

Owner:QUALCOMM INC

Adaptive Primary-Ambient Decomposition of Audio Signals

ActiveUS20090252341A1Promote decompositionPrevent leakageSpeech analysisLine-transmissionKernel principal component analysisDecomposition

A stereo audio signal is processed to determine primary and ambient components by transforming the signal into vectors corresponding to subband signals, and decomposing the left and right channel vectors into ambient and primary components by matrix and vector operations. Principal component analysis is used to determine a primary component unit vector, and ambience components are determined according to a correlation-based cross-fade or an orthogonal basis derivation.

Adaptive Primary-Ambient Decomposition of Audio Signals

Adaptive Primary-Ambient Decomposition of Audio Signals

Adaptive Primary-Ambient Decomposition of Audio Signals

Owner:CREATIVE TECH CORP

Bit-width allocation for scientific computations

ActiveUS20120023149A1Digital computer detailsComputation using denominational number representationComputer resourcesComputational science

Methods and devices for automatically determining a suitable bit-width for data types to be used in computer resource intensive computations. Methods for range refinement for intermediate variables and for determining suitable bit-widths for data to be used in vector operations are also presented. The invention may be applied to various computing devices such as CPUs, GPUs, FPGAs, etc.

Bit-width allocation for scientific computations

Bit-width allocation for scientific computations

Bit-width allocation for scientific computations

Owner:MCMASTER UNIV

Vector floating-point computing device and method based on vector computing

InactiveCN102262525ADigital data processing detailsConcurrent instruction executionHardware structureCoprocessor

The invention discloses a vector-operation-based floating point operational device, which is a novel hardware structure, and comprises a vector processor, a storage device, a vector floating point coprocessor and a vector floating point coprocessor storage device, wherein a bus interface between the vector processor and the vector floating point coprocessor can adopt a general coprocessor bus structure. By adopting the coprocessor, on the basis of ensuring all operations of floating points, the operational speed of the floating points is improved and the design complexity is reduced.

Vector floating-point computing device and method based on vector computing

Vector floating-point computing device and method based on vector computing

Vector floating-point computing device and method based on vector computing

Owner:孙瑞玮

Vector calculating device

ActiveCN106990940AImprove execution performanceSimple formatRegister arrangementsComplex mathematical operationsProcessor registerScratchpad memory

The invention provides a vector calculating device comprises a memory cell, a register unit and a vector operation unit. Vectors are stored in the memory cell, addresses stored by the vectors are stored in the register unit, and the vector operation unit obtains a vector address in the register unit in dependence on a vector operation instruction, and then obtains a corresponding vector in the memory cell in dependence on the vector address, and carries out vector operation in dependence on the obtained vector to obtain a vector operation result. According to the invention, vector data participating in calculation is temporarily stored in a scratchpad memory, data in different widths can be supported flexibly and effectively during the vector operation process, and the execution performance of tasks including a lot of vector calculations is improved.

Vector calculating device

Vector calculating device

Vector calculating device

Owner:CAMBRICON TECH CO LTD

Handling permanent and transient errors using a SIMD unit

InactiveUS20060190700A1Error detection/correctionGeneral purpose stored program computerScalar ValueExecution unit

A method for handling permanent and transient errors in a microprocessor is disclosed. The method includes reading a scalar value and a scalar operation from an execution unit of the microprocessor. The method further includes writing a copy of the scalar value into each of a plurality of elements of a vector register of a Single Instruction Multiple Data (SIMD) unit of the microprocessor and executing the scalar operation on each scalar value in each of the plurality of elements of the vector register of the SIMED unit using a vector operation. The method further includes comparing each result of the scalar operation on each scalar value in each of the plurality of elements of the vector register and detecting a permanent or transient error if all of the results are not identical.

Handling permanent and transient errors using a SIMD unit

Handling permanent and transient errors using a SIMD unit

Handling permanent and transient errors using a SIMD unit

Owner:IBM CORP

Operation device and method of accelerating chip which accelerates depth neural network algorithm

ActiveCN106529668AExtended waiting timeConsume morePhysical realisationNeural learning methodsSynaptic weightAlgorithm

The invention provides an operation device and method of an accelerating chip which accelerates a depth neural network algorithm. The device comprises a vector addition processor module which carries out addition or subtraction of vectors and / or the vector operation of a pooling layer algorithm in the depth neural network algorithm, a vector function value operator module which carries out the vector operation of the nonlinear evaluation in the depth neural network algorithm, and a vector multiply adder module which carries out the multiply addition operation of the vectors. The three modules execute a programmable instruction and interact with each other to calculate a neural network output result and a synaptic weight change amount which represents the neuron action intensity between neural layers. The three modules are provided with middle value storage areas and carry out reading and writing operations on a master memory. Thus, the reading and writing frequencies of the middle value of the master memory can be reduced, the energy consumption of the accelerator chip can be reduced, and the data loss and replacement problems in a data processing process can be avoided.

Operation device and method of accelerating chip which accelerates depth neural network algorithm

Operation device and method of accelerating chip which accelerates depth neural network algorithm

Operation device and method of accelerating chip which accelerates depth neural network algorithm

Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods

ActiveUS20140280420A1Save extra spaceFast processing efficiencyDigital data processing detailsDigital computer detailsFourier transform on finite groupsDatapath

Vector processing engines (VPEs) having programmable data path configurations for providing multi-mode Radix-2X butterfly vector processing circuits. Related vector processors, systems, and methods are also disclosed. The VPEs disclosed herein include a plurality of vector processing stages each having vector processing blocks that have programmable data path configurations for performing Radix-2X butterfly vector operations to perform Fast Fourier Transform (FFT) vector processing operations efficiently. The data path configurations of the vector processing blocks can be programmed to provide different types of Radix-2X butterfly vector operations as well as other arithmetic logic vector operations. As a result, fewer VPEs can provide desired Radix-2X butterfly vector operations and other types arithmetic logic vector operations in a vector processor, thus saving area in the vector processor while still retaining vector processing advantages of fewer register writes and faster vector instruction execution times over scalar processing engines.

Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods

Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods

Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods

Owner:QUALCOMM INC

Data processing apparatus and method for handling vector instructions

ActiveUS20100312988A1Complex nestingIncrease profitGeneral purpose stored program computerProgram controlProcessor registerData element

A data processing apparatus and method and provided for handling vector instructions. The data processing apparatus has a register data store with a plurality of registers arranged to store data elements. A vector processing unit is then used to execute a sequence of vector instructions, with the vector processing unit having a plurality of lanes of parallel processing and having access to the register data store in order to read data elements from, and write data elements to, the register data store during the execution of the sequence of vector instructions. A skip indication storage maintains a skip indicator for each of the lanes of parallel processing. The vector processing unit is responsive to a vector skip instruction to perform an update operation to set within the skip indication storage the skip indicator for a determined one or more lanes. The vector processing unit is responsive to a vector operation instruction to perform an operation in parallel on data elements input to the plurality of lanes of parallel processing, but to exclude from the performance of the operation any lane whose associated skip indicator is set. This allows the operation specified by vector instructions to be performed conditionally within each of the lanes of parallel processing without any modification to the vector instructions that are specifying those operations.

Data processing apparatus and method for handling vector instructions

Data processing apparatus and method for handling vector instructions

Data processing apparatus and method for handling vector instructions

Owner:ARM LTD

Reconfigurable parallel execution and load-store slice processing methods

ActiveUS20160202991A1Memory architecture accessing/allocationInstruction analysisHardware threadControl signal

A method of operating a processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and / or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.

Reconfigurable parallel execution and load-store slice processing methods

Reconfigurable parallel execution and load-store slice processing methods

Reconfigurable parallel execution and load-store slice processing methods

Owner:IBM CORP

System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine

InactiveUS20080229066A1Low costIncrease choiceRegister arrangementsGeneral purpose stored program computerDynamic compilationLoad instruction

A system, method, and computer program product are provided for performing scalar operations using a SIMD data parallel execution unit. With the mechanisms of the illustrative embodiments, scalar operations in application code are identified that may be executed using vector operations in a SIMD data parallel execution unit. The scalar operations are converted, such as by a static or dynamic compiler, into one or more vector load instructions and one or more vector computation instructions. In addition, control words may be generated to adjust the alignment of the scalar values for the scalar operation within the vector registers to which these scalar values are loaded using the vector load instructions. The alignment amounts for adjusting the scalar values within the vector registers may be statically or dynamically determined.

System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine

System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine

System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine

Owner:INT BUSINESS MASCH CORP

Efficient Texture Processing of Pixel Groups with SIMD Execution Unit

ActiveUS20090187734A1Increase profitMaximize utilizationCathode-ray tube indicatorsProcessor architectures/configurationExecution unitTexture processing

A circuit arrangement and method perform concurrent texture processing of groups of pixels with a single instruction multiple data (SIMD) execution unit to improve the utilization of the SIMD execution unit when performing scalar operations associated with a texture processing algorithm. In addition, when utilized in connection with a multi-threaded SIMD execution unit, groups of pixels may be concurrently processed in different threads executed by the SIMD execution unit to further maximize the utilization of the SIMD execution unit by reducing the adverse effects of dependencies in scalar and / or vector operations incorporated into a texture processing algorithm.

Efficient Texture Processing of Pixel Groups with SIMD Execution Unit

Efficient Texture Processing of Pixel Groups with SIMD Execution Unit

Efficient Texture Processing of Pixel Groups with SIMD Execution Unit

Owner:RAKUTEN GRP INC

Processor reduction unit for accumulation of multiple operands with or without saturation

ActiveUS20050071413A1Performed quicklyEasily pipelinedDigital computer detailsConcurrent instruction executionCombined useOperand

A processor having a reduction unit that sums m input operands plus an accumulator value, with the option of saturating after each addition or wrapping around the result of each addition. The reduction unit also allows the m input operands to be subtracted from the accumulator value by simply inverting the bits of the input operands and setting a carry into each of a plurality of reduction adders to one. The reduction unit can be used in conjunction with m parallel multipliers to quickly perform dot products and other vector operations with either saturating or wrap-around arithmetic.

Processor reduction unit for accumulation of multiple operands with or without saturation

Processor reduction unit for accumulation of multiple operands with or without saturation

Processor reduction unit for accumulation of multiple operands with or without saturation

Owner:QUALCOMM INC

System and method of processing data using scalar/vector instructions

ActiveUS7676647B2Improve performanceReduce power consumptionConditional code generationGeneral purpose stored program computerProcessor registerCondition Code

A processor device is disclosed that includes a register file with a combined condition code register for scalar and vector operations. The processor device utilizes the combined condition code register for scalar and vector operations. Further, a compare operation can store resulting bits in the combined condition code register and a conditional operation can utilize the combined condition code register bits for evaluating a condition.

System and method of processing data using scalar/vector instructions

System and method of processing data using scalar/vector instructions

System and method of processing data using scalar/vector instructions

Owner:QUALCOMM INC

Method and apparatus for instruction execution in a data processing system

InactiveUS6795908B1Instruction analysisGeneral purpose stored program computerData processing systemData stream

A method for processing scalar and vector executions, where vector executions may be "true" vector operations, CVA, or pseudo-vector operations, PVA. All three types of executions are processed using one architecture. In one embodiment, a compiler analyzes code to identify sections that are vectorizable, and applies either CVA, PVA, or a combination of the two to process these sections. Register overlay is provided for storing load address information and data in PVA mode. Within each CVA and PVA instruction, enable bits describe the data streaming function of the operation. A temporary memory, TM, accommodates variable size vectors, and is used in vector operations, similar to a vector register, to store temporary vectors.

Method and apparatus for instruction execution in a data processing system

Method and apparatus for instruction execution in a data processing system

Method and apparatus for instruction execution in a data processing system

Owner:NVIDIA CORP

Processor reduction unit for accumulation of multiple operands with or without saturation

ActiveUS7593978B2Performed quicklyEasily pipelinedDigital computer detailsConcurrent instruction executionOperandWrap around

A processor having a reduction unit that sums m input operands plus an accumulator value, with the option of saturating after each addition or wrapping around the result of each addition. The reduction unit also allows the m input operands to be subtracted from the accumulator value by simply inverting the bits of the input operands and setting a carry into each of a plurality of reduction adders to one. The reduction unit can be used in conjunction with m parallel multipliers to quickly perform dot products and other vector operations with either saturating or wrap-around arithmetic.

Processor reduction unit for accumulation of multiple operands with or without saturation

Processor reduction unit for accumulation of multiple operands with or without saturation

Processor reduction unit for accumulation of multiple operands with or without saturation

Owner:QUALCOMM INC

Extraction of left/center/right information from two-channel stereo sources

ActiveUS7542815B1Multiplex communicationMechanical record carriersTime domainVocal tract

A digital audio signal processing system and method transforms two-channel stereo time-domain data into the frequency domain. Vector operations are performed upon the frequency-domain data by which signal components unique to one of the input channels are routed to one of the output channels, signal components unique to the other of the input channels are routed to another of the output channels, and signal components common to both channels are routed to a third and optionally to a fourth output channel. The frequency-domain output channels are then transformed back into the time-domain, forming an equivalent number of channels of output audio data. The vector operations are performed in a manner that preserves the overall information content of the input data.

Extraction of left/center/right information from two-channel stereo sources

Extraction of left/center/right information from two-channel stereo sources

Extraction of left/center/right information from two-channel stereo sources

Owner:AKITA BLUE

Operation device and related product

ActiveCN107861757AFunction increaseEasy to useConcurrent instruction executionControl unitPower consumption

The invention provides an operation device. The operation device is used for executing operation according to an extended instruction and comprises a storage, an operation unit and a control unit; theextended instruction comprises an operation code and an operation domain, and the storage is used for storing a vector; the control unit is used for obtaining the extended instruction, analyzing theextended instruction to obtain a vector operation instruction and a second operation instruction, determining the calculation sequence of the vector operation instruction and the second operation instruction according to the vector operation instruction and the second operation instruction and reading an input vector corresponding to an input vector address from the storage; the operation unit isused for executing the vector operation instruction and the second operation instruction on the input vector according to the calculation sequence to obtain a result of the extended instruction. The operation device has the advantages of being low in power consumption and small in calculation expenditure.

Operation device and related product

Operation device and related product

Operation device and related product

Owner:SHANGHAI CAMBRICON INFORMATION TECH CO LTD

Computer simulation of body dynamics including a solver that solves in linear time for a set of constraints using vector processing

ActiveUS20060262114A1Programme controlIndoor gamesLinear correlationBody dynamics

Computer simulation of the dynamics of rigid bodies interacting through collisions, stacks and joints is performed using a constraint-based system in which constraints are defined in terms of the positions of the bodies. Displacements caused by reaction forces necessary to ensure that the bodies comply with the position constraints can be calculated and can be done iteratively by updating equations defining the reaction forces and the displacements such that the computation time and memory resources required to perform the calculations is linearly dependent upon the number of bodies and the number of contacts and joints between the bodies. Computational requirements and memory requirements are reduced further by performing the calculations using vector operations.

Computer simulation of body dynamics including a solver that solves in linear time for a set of constraints using vector processing

Computer simulation of body dynamics including a solver that solves in linear time for a set of constraints using vector processing

Computer simulation of body dynamics including a solver that solves in linear time for a set of constraints using vector processing

Owner:ELECTRONICS ARTS INC

Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization

InactiveUS20050283775A1Software engineeringComputation using non-contact making devicesInformation processingSimd vectorization

A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted in to virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.

Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization

Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization

Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization

Owner:IBM CORP

Reconfigurable parallel execution and load-store slice processor

ActiveUS20160202989A1Memory architecture accessing/allocationInstruction analysisHardware threadMode control

A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and / or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.

Reconfigurable parallel execution and load-store slice processor

Reconfigurable parallel execution and load-store slice processor

Reconfigurable parallel execution and load-store slice processor

Owner:IBM CORP

Popular searches

Computer science Processor design Register file Quantum gate Non linearite Virtual machine Computer engineering MicroBlaze Bandwidth requirement Instruction sequence