Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

96 results about "Simd processor" patented technology

Simty is a massively multi-threaded processor core that dynamically assembles SIMD instructions from scalar multi-threaded code. It runs the RISC-V (RV32-I) instruction set. Unlike existing SIMD or SIMT processors like GPUs, Simty runs binaries compiled for general-purpose processors without any instruction set extension or compiler changes.

SIMD processor and addressing method

A single instruction, multiple data (SIMD) processor including a plurality of addressing register sets, used to flexibly calculate effective operand source and destination memory addresses is disclosed. Two or more address generators calculate effective addresses using the register sets. Each register set includes a pointer register, and a scale register. An address generator forms effective addresses from a selected register set's pointer register and scale register; and an offset. For example, the effective memory address may be formed by multiplying the scale value by an offset value and summing the pointer and the scale value multiplied by the offset value.
Owner:AVAGO TECH INT SALES PTE LTD

Flexible vector modes of operation for SIMD processor

In addition to the usual modes of SIMD processor operation, where corresponding elements of two source vector registers are used as input pairs to be operated upon by the execution unit, or where one element of a source vector register is broadcast for use across the elements of another source vector register, the new system provides several other modes of operation for the elements of one or two source vector registers. Improving upon the time-costly moving of elements for an operation such as DCT, the present invention defines a more general set of modes of vector operations. In one embodiment, these new modes of operation use a third vector register to define how each element of one or both source vector registers are mapped, in order to pair these mapped elements as inputs to a vector execution unit. Furthermore, the decision to write an individual vector element result to a destination vector register, for each individual element produced by the vector execution unit, may be selectively disabled, enabled, or made to depend upon a selectable condition flag or a mask bit.
Owner:MIMAR TIBET

Method and apparatus for enable/disable control of SIMD processor slices

Methods and apparatus provide for disabling at least some data path processing circuits of a SIMD processing pipeline, in which the processing circuits are organized into a matrix of slices and stages, in response to one or more enable flags during a given cycle.
Owner:SONY COMPUTER ENTERTAINMENT INC

Method for performing random read access to a block of data using parallel lut read instruction in vector processors

This invention deals with the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Owner:TEXAS INSTR INC

Method for programmable motion estimation in a SIMD processor

The present invention provides a 16×16-sliding window using vector register file with zero overhead for horizontal or vertical shifts to incorporate motion estimation into SIMD vector processor architecture. SIMD processor's vector load mechanism, vector register file with shifting of elements capability, and 16×16 parallel SAD calculation hardware and instruction are used. Vertical shifts of all sixteen-vector registers occur in a ripple-through fashion when the end vector register is loaded. The parallel SAD calculation hardware can calculate one 16-by-16-block match per clock cycle in a pipelined fashion. In addition, hardware for best-match SAD value comparisons and maintaining their pixel location reduces the software overhead. Block matching for less than 16 by 16 block areas is supported using a mask register to mask selected elements, thereby reducing search area to any block size less than 16 by 16.
Owner:MIMAR TIBET

Fast and flexible scan conversion and matrix transpose in a SIMD processor

The present invention provides efficient ways to implement scan conversion and matrix transpose operations using vector multiplex operations in a SIMD processor. The present method provides a very fast and flexible way to implement different scan conversions, such as zigzag conversion, and matrix transpose for 2×2, 4×4, 8×8 blocks commonly used by all video compression and decompression algorithms.
Owner:MIMAR TIBET

System and method for processing image data relative to a focus of attention within the overall image

This invention provides a system and method for processing discrete image data within an overall set of acquired image data based upon a focus of attention within that image. The result of such processing is to operate upon a more limited subset of the overall image data to generate output values required by the vision system process. Such output value can be a decoded ID or other alphanumeric data. The system and method is performed in a vision system having two processor groups, along with a data memory that is smaller in capacity than the amount of image data to be read out from the sensor array. The first processor group is a plurality of SIMD processors and at least one general purpose processor, co-located on the same die with the data memory. A data reduction function operates within the same clock cycle as data-readout from the sensor to generate a reduced data set that is stored in the on-die data memory. At least a portion of the overall, unreduced image data is concurrently (in the same clock cycle) transferred to the second processor while the first processor transmits at least one region indicator with respect to the reduced data set to the second processor. The region indicator represents at least one focus of attention for the second processor to operate upon.
Owner:COGNEX CORP

Volume rendering apparatus and process

A computer automated process is presented for accelerating the rendering of sparse volume data on Graphics Processing Units (GPUs). GPUs are typically SIMD processors, and thus well suited to processing continuous data and not sparse data. The invention allows GPUs to process sparse data efficiently through the use of scatter-gather textures. The invention can be used to accelerate the rendering of sparse volume data in medical imaging or other fields.
Owner:TOSHIBA MEDICAL VISUALIZATION SYST EURO

Systems and methods for accelerating sub-pixel interpolation in video processing applications

ActiveUS20070070080A1Increase precision supported for all operationImprove accuracyGeometric image transformationCathode-ray tube indicatorsVideo processingDiagonal
A data path for a SIMD-based microprocessor is used to perform different simultaneous filter sub-operations in parallel data lanes of the SIMD-based microprocessor. Filter operations for sub-pixel interpolation are performed simultaneously on separate lanes of the SIMD processor's data path. Using a dedicated internal data path, precision higher than the native precision of the SIMD unit may be achieved. Through the data path according to this invention, a single instruction may be used to generate the value of two adjacent sub-pixels located diagonally with respect to integer pixel positions.
Owner:SYNOPSYS INC

SIMD processor with scalar arithmetic logic units

A scalar processor that includes a plurality of scalar arithmetic logic units and a special function unit. Each scalar unit performs, in a different time interval, the same operation on a different data item, where each different time interval is one of a plurality of successive, adjacent time intervals. Each unit provides an output data item in the time interval in which the unit performs the operation and provides a processed data item in the last of the successive, adjacent time intervals. The special function unit provides a special function computation for the output data item of a selected one of the scalar units, in the time interval in which the selected scalar unit performs the operation, so as to avoid a conflict in use among the scalar units. A vector processing unit includes an input data buffer, the scalar processor, and an output orthogonal converter.
Owner:S3 GRAPHICS

Histogram generation with vector operations in SIMD and VLIW processor by consolidating LUTs storing parallel update incremented count values for vector data elements

The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.
Owner:MIMAR TIBET

SIMD processor executing min/max instructions

A SIMD processor responds to a single min / max instruction to find the minimum or maximum valued data unit in an array of data units. The determined minimum / maximum value and an associated index value thereto may be output. Alternatively, the value of a data unit in another array may be output at a corresponding location. A further single instruction executable by the SIMD processor, may be applied to results obtained using such a single min / max instruction, to allow such instructions to operate on two dimensional arrays.
Owner:AVAGO TECH INT SALES PTE LTD

Method and system for parallel histogram calculation in a simd and vliw processor

The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.
Owner:MIMAR TIBET

SIMD operation system capable of designating plural registers

In view of a necessity of alleviating factors obstructing an effect of SIMD operation such as in-register data alignment in high speed formation of an SIMD processor, numerous data can be supplied to a data alignment operation pipe 211 by dividing a register file into four banks and enabling to designate a plurality of registers by a single piece of operand to thereby enable to make access to four registers simultaneously and data alignment operation can be carried out at high speed. Further, by defining new data pack instruction, data unpack instruction and data permutation instruction, data supplied in a large number can be aligned efficiently. Further, by the above-described characteristic, definition of multiply accumulate operation instruction maximizing parallelism of SIMD can be carried out.
Owner:RENESAS ELECTRONICS CORP

Compilation for a SIMD RISC processor

A computer implemented method, data processing system, and computer usable code are provided for generating code to perform scalar computations on a Single-Instruction Multiple-Data (SIMD) Reduced Instruction Set Computer (RISC) architecture. The illustrative embodiments generate code directed at loading at least one scalar value and generate code using at least one vector operation to generate a scalar result, wherein all scalar computation for integer and floating point data is performed in a SIMD vector execution unit.
Owner:INT BUSINESS MASCH CORP

System and method for processing thread groups in a SIMD architecture

A SIMD processor efficiently utilizes its hardware resources to achieve higher data processing throughput. The effective width of a SIMD processor is extended by clocking the instruction processing side of the SIMD processor at a fraction of the rate of the data processing side and by providing multiple execution pipelines, each with multiple data paths. As a result, higher data processing throughput is achieved while an instruction is fetched and issued once per clock. This configuration also allows a large group of threads to be clustered and executed together through the SIMD processor so that greater memory efficiency can be achieved for certain types of operations like texture memory accesses performed in connection with graphics processing.
Owner:NVIDIA CORP

SIMD operation system capable of designating plural registers via one register designating field

In view of a necessity of alleviating factors obstructing an effect of SIMD operation such as in-register data alignment in high speed formation of an SIMD processor, numerous data can be supplied to a data alignment operation pipe 211 by dividing a register file into four banks and enabling to designate a plurality of registers by a single piece of operand to thereby enable to make access to four registers simultaneously and data alignment operation can be carried out at high speed. Further, by defining new data pack instruction, data unpack instruction and data permutation instruction, data supplied in a large number can be aligned efficiently. Further, by the above-described characteristic, definition of multiply accumulate operation instruction maximizing parallelism of SIMD can be carried out.
Owner:RENESAS ELECTRONICS CORP

Merged control/process element processor for executing VLIW simplex instructions with SISD control/SIMD process mode bit

A highly parallel data processing system includes an array of n processing elements (PEs) and a controller sequence processor (SP) wherein at least one PE is combined with the controller SP to create a Dynamic Merged Processor (DP) which supports two modes of operation. In its first mode of operation, the DP acts as one of the PEs in the array and participates in the execution of single-instruction-multiple-data (SIMD) instructions. In the second mode of operation, the DP acts as the controlling element for the array of PEs and executes non-array instructions. To support these two modes of operation, the DP includes a plurality of execution units and two general-purpose register files. The execution units are “shared” in that they can execute instructions in either mode of operation. With very long instruction word (VLIW) capability, both modes of operation can be in effect on a cycle by cycle basis for every VLIW executed. This structure allows the controlling element in a highly parallel SIMD processor to be reused as one of the processing elements in the array to reduce the overall number of transistors and wires in the SIMD processor while maintaining its capabilities and performance.
Owner:ALTERA CORP

Method and apparatus for enable/disable control of SIMD processor slices

Methods and apparatus provide for disabling at least some data path processing circuits of a SIMD processing pipeline, in which the processing circuits are organized into a matrix of slices and stages, in response to one or more enable flags during a given cycle.
Owner:SONY COMPUTER ENTERTAINMENT INC

SIMD processor with exchange sort instruction operating or plural data elements simultaneously

An SIMD type microprocessor having a plurality of processor elements, wherein data stored in a specific register included in each processor element and data stored in an operand-designated source register are compared based on a first type of instruction; after the comparison, a larger data is stored in the specific register; and a smaller data is stored in the source register or an operand-designated destination register other than the source register.
Owner:RICOH KK

Reconfigurable Logic in Processors

A data processor comprises an array of processing elements (PEn 4), each element in the array comprising a respective configurable logic unit (CLU 11), whereby the logic capability of each processing element can be reconfigured at will. Memory (14, FIGS. 3, 4 not shown) may be pre-loaded with configuration instructions, whereby the configuration state of each processing element can be automatically sequenced from the pre-loaded memory. The memory may be global, in which case the CLUs may be reconfigured in parallel, to perform the same function. Alternatively, the memory may be local to each processing element so that different CLUs implement different functions. Configuration may be carried out under program control at a thread switch. Each respective processing element may select, at run time, a specific configuration from a number of configurations in a microcode store. The processor is preferably a SIMD processor.
Owner:CLEARSPEED TECH

Audio and video processing apparatus

The present invention performs video and audio compression / decompression, video input and output scaling, video input and output processing for enhancement, and system layer functions on a single semiconductor chip. The media processor is compromised of video processor with a SIMD vector engine, audio processor, stream processor, system processor, and video scalers, LUTs and hardware blender. Unified memory architecture is used where these four processors use a shared memory for data and instructions. Data transfers between multiple processors use multiple packet-based unidirectional communication channels via hardware-assisted circular queues in unified memory. The video processor is a SIMD processor coupled to a regular RISC processor as a dual-issue processor. Such integrated and programmable functionality provides implementation of multiple video and audio for compression standards and programmable video enhancement. Important applications of this include Digital TV, IP Video Phone, and Digital Camcorder / Camera.
Owner:MIMAR TIBET

Gate-Level Logic Simulator Using Multiple Processor Architectures

Techniques for simulating operation of a connectivity level description of an integrated circuit design are provided, for example, to simulate logic elements expressed through a netlist description. The techniques utilize a host processor selectively partitioning and optimizing the descriptions of the integrated circuit design for efficient simulation on a parallel processor, more particularly a SIMD processor. The description may be segmented into cluster groups, for example macro-gates, formed of logic elements, where the cluster groups are sized for parallel simulation on the parallel processor. Simulation may occur in an oblivious as well as event-driven manner, depending on the implementation.
Owner:RGT UNIV OF MICHIGAN

Vector SIMD processor

A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically. Composition of the floating-point length-4 vector inner product execution units to calculate the sum of the inner product of length-4 vectors and scalar to be compatible with SIMD of four in parallel results in a processing capability of 32 FLOPS per cycle.
Owner:RENESAS ELECTRONICS CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products