Patents

Literature

Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.

89 results about "Scalar processor" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Scalar processors represent a class of computer processors. A scalar processor processes only one data item at a time, with typical data items being integers or floating point numbers. A scalar processor is classified as a SISD processor (Single Instructions, Single Data) in Flynn's taxonomy.

Digital signal processor containing scalar processor and a plurality of vector processors operating from a single instruction

InactiveUS6317819B1Register arrangementsMemory adressing/allocation/relocationCrossbar switchDigital data

A digital data processor integrated circuit (1) includes a plurality of functionally identical first processor elements (6A) and a second processor element (5). The first processor elements are bidirectionally coupled to a first cache (12) via a crossbar switch matrix (8). The second processor element is coupled to a second cache (11). Each of the first cache and the second cache contain a two-way, set-associative cache memory that uses a least-recently-used (LRU) replacement algorithm and that operates with a use-as-fill mode to minimize a number of wait states said processor elements need experience before continuing execution after a cache-miss. An operation of each of the first processor elements and an operation of the second processor element are locked together during an execution of a single instruction read from the second cache. The instruction specifies, in a first portion that is coupled in common to each of the plurality of first processor elements, the operation of each of the plurality of first processor elements in parallel. A second portion of the instruction specifies the operation of the second processor element. Also included is a motion estimator (7) and an internal data bus coupling together a first parallel port (3A), a second parallel port (3B), a third parallel port (3C), an external memory interface (2), and a data input / output of the first cache and the second cache.

Digital signal processor containing scalar processor and a plurality of vector processors operating from a single instruction

Digital signal processor containing scalar processor and a plurality of vector processors operating from a single instruction

Digital signal processor containing scalar processor and a plurality of vector processors operating from a single instruction

Owner:CUFER ASSET LTD LLC

Cycle segmented prefix circuits

InactiveUS6609189B1Improve performanceAvoid performanceComputation using non-contact making devicesGeneral purpose stored program computerExtensibilityScalar processor

The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path delays of many components in existing implementations grow quadratically with the issue width and the window size. This patent presents a novel way to reimplement these components and reduce their critical-path delay growth. It then describes an entire processor microarchitecture, called the Ultrascalar processor, that has better critical-path delay growth than existing superscalars. Most of our scalable designs are based on a single circuit, a cyclic segmented parallel prefix (cspp). We observe that processor components typically operate on a wrap-around sequence of instructions, computing some associative property of that sequence. For example, to assign an ALU to the oldest requesting instruction, each instruction in the instruction sequence must be told whether any preceding instructions are requesting an ALU. Similarly, to read an argument register, an instruction must somehow communicate with the most recent preceding instruction that wrote that register. A cspp circuit can implement such functions by computing for each instruction within a wrap-around instruction sequence the accumulative result of applying some associative operator to all the preceding instructions. A cspp circuit has a critical path gate delay logarithmic in the length of the instruction sequence. Depending on its associative operation and its layout, a cspp circuit can have a critical path wire delay sublinear in the length of the instruction sequence.

Cycle segmented prefix circuits

Cycle segmented prefix circuits

Cycle segmented prefix circuits

Owner:YALE UNIV

Macroscalar processor architecture

ActiveUS20060004996A1Software engineeringDigital computer detailsScalar processorParallel computing

A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.

Macroscalar processor architecture

Macroscalar processor architecture

Macroscalar processor architecture

Owner:APPLE INC

Fast just-in-time (JIT) scheduler

InactiveUS6139199ASoftware engineeringGeneral purpose stored program computerByteBasic block

A just-in-time (JIT) compiler typically generates code from bytecodes that have a sequence of assembly instructions forming a "template". It has been discovered that a just-in-time (JIT) compiler generates a small number, approximately 2.3, assembly instructions per bytecode. It has also been discovered that, within a template, the assembly instructions are almost always dependent on the next assembly instruction. The absence of a dependence between instructions of different templates is exploited to increase the size of issue groups using scheduling. A fast method for scheduling program instructions is useful in just-in-time (JIT) compilers. Scheduling of instructions is generally useful for just-in-time (JIT) compilers that are targeted to in-order superscalar processors because the code generated by the JIT compilers is often sequential in nature. The disclosed fast scheduling method has a complexity, and therefore an execution time, that is proportional to the number of instructions in an instruction block (N complexity), a substantial improvement in comparison to the N2 complexity of conventional compiler schedulers. The described fast scheduler advantageously reorders instructions with a single pass, or few passes, through a basic instruction block while a conventional compiler scheduler such as the DAG scheduler must iterate over an instruction basic block many times. A fast scheduler operates using an analysis of a sliding window of three instructions, applying two rules within the three instruction window to determine when to reorder instructions. The analysis includes acquiring the opcodes and operands of each instruction in the three instruction window, and determining register usage and definition of the operands of each instruction with respect to the other instructions within the window. The rules are applied to determine ordering of the instructions within the window.

Fast just-in-time (JIT) scheduler

Fast just-in-time (JIT) scheduler

Fast just-in-time (JIT) scheduler

Owner:ORACLE INT CORP

Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions

InactiveUS6457120B1Memory adressing/allocation/relocationDigital computer detailsScalar processorSuperscalar

A superscalar processor and method are disclosed for improving the accuracy of predictions of a destination of a branch instruction utilizing a cache. The cache is established including multiple entries. Each of multiple branch instructions are associated with one of the entries of the cache. One of the entries of the cache includes a stored predicted destination for the branch instruction associated with this entry of the cache. The predicted destination is a destination the branch instruction is of predicted to branch to upon execution of the branch instruction. The stored predicted destination is updated in the one of the entries of the cache only in response to two consecutive mispredictions of the destination of the branch instruction, wherein the two consecutive mispredictions were made utilizing the one of the entries of the cache.

Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions

Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions

Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions

Owner:IBM CORP

Macroscalar processor architecture

ActiveUS7395419B1Software engineeringDigital computer detailsScalar processorParallel computing

A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.

Macroscalar processor architecture

Macroscalar processor architecture

Macroscalar processor architecture

Owner:APPLE INC

Processor with demand-driven clock throttling power reduction

InactiveUS20040044915A1Energy efficient ICTVolume/mass flow measurementScalar processorProcessor register

A synchronous integrated circuit such as a scalar processor or superscalar processor. Circuit components or units are clocked by and synchronized to a common system clock. At least two of the clocked units include multiple register stages, e.g., pipeline stages. A local clock generator in each clocked unit combines the common system clock and stall status from one or more other units to adjust register clock frequency up or down.

Processor with demand-driven clock throttling power reduction

Processor with demand-driven clock throttling power reduction

Processor with demand-driven clock throttling power reduction

Owner:IBM CORP

Vector processor

ActiveUS20070255894A1General purpose stored program computerDigital storageProcessing InstructionScalar processor

A vector processing system provides high performance vector processing using a System-On-a-Chip (SOC) implementation technique. One or more scalar processors (or cores) operate in conjunction with a vector processor, and the processors collectively share access to a plurality of memory interfaces coupled to Dynamic Random Access read / write Memories (DRAMs). In typical embodiments the vector processor operates as a slave to the scalar processors, executing computationally intensive Single Instruction Multiple Data (SIMD) codes in response to commands received from the scalar processors. The vector processor implements a vector processing Instruction Set Architecture (ISA) including machine state, instruction set, exception model, and memory model.

Vector processor

Vector processor

Vector processor

Owner:HESSEL RICHARD +1

Processor pipeline including partial replay

InactiveUS6076153AGeneral purpose stored program computerConcurrent instruction executionSpeculative executionScalar processor

The invention, in one embodiment, is a method for committing the results of at least two speculatively executed instructions to an architectural state in a superscalar processor. The method includes determining which of the speculatively executed instructions encountered a problem in execution, and replaying the instruction that encountered the problem in execution while retaining the results of executing the instruction that did not encounter the problem.

Processor pipeline including partial replay

Processor pipeline including partial replay

Processor pipeline including partial replay

Owner:INTEL CORP

Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address

ActiveUS7493475B2Register arrangementsGeneral purpose stored program computerMultiplexingScalar processor

An improved superscalar processor. The processor includes multiple lanes, allowing multiple instructions in a bundle to be executed in parallel. In vector mode, the parallel lanes may be used to execute multiple instances of a bundle, representing multiple iterations of the bundle in a vector run. Scheduling logic determines whether, for each bundle, multiple instances can be executed in parallel. If multiple instances can be executed in parallel, coupling circuitry couples an instance of the bundle from one lane into one or more other lanes. In each lane, register addresses are renamed to ensure proper execution of the bundles in the vector run. Additionally, the processor may include a register bank separate from the architectural register file. Renaming logic can generate addresses to this separate register bank that are longer than used to address architectural registers, allowing longer vectors and more efficient processor operation.

Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address

Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address

Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address

Owner:STMICROELECTRONICS SRL

Macroscalar processor architecture

ActiveUS7617496B2Software engineeringDigital computer detailsScalar processorParallel computing

A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.

Macroscalar processor architecture

Macroscalar processor architecture

Macroscalar processor architecture

Owner:APPLE INC

Processor with demand-driven clock throttling power reduction

InactiveUS7076681B2Reduce consumptionNo performance lossEnergy efficient ICTVolume/mass flow measurementScalar processorProcessor register

A synchronous integrated circuit such as a scalar processor or superscalar processor. Circuit components or units are clocked by and synchronized to a common system clock. At least two of the clocked units include multiple register stages, e.g., pipeline stages. A local clock generator in each clocked unit combines the common system clock and stall status from one or more other units to adjust register clock frequency up or down.

Processor with demand-driven clock throttling power reduction

Processor with demand-driven clock throttling power reduction

Processor with demand-driven clock throttling power reduction

Owner:INT BUSINESS MASCH CORP

Instruction issue control within a multi-threaded in-order superscalar processor

ActiveUS20080270749A1Performance constrainedReduce hardware overheadGeneral purpose stored program computerMemory systemsScalar processorProgram Thread

A multi-threaded in-order superscalar processor 2 is described having a fetch stage 8 within which thread interleaving circuitry 36 interleaves instructions taken from different program threads to form an interleaved stream of instructions which is then decoded and subject to issue. Hint generation circuitry 62 within the fetch stage 8 adds hint data to the threads indicating that parallel issue of an associated instruction is permitted with one of more other instructions.

Instruction issue control within a multi-threaded in-order superscalar processor

Instruction issue control within a multi-threaded in-order superscalar processor

Instruction issue control within a multi-threaded in-order superscalar processor

Owner:ARM LTD

SIMD processor with scalar arithmetic logic units

ActiveUS7146486B1Shorten the timeAvoid confictConcurrent instruction executionArchitecture with single central processing unitArithmetic logic unitScalar processor

A scalar processor that includes a plurality of scalar arithmetic logic units and a special function unit. Each scalar unit performs, in a different time interval, the same operation on a different data item, where each different time interval is one of a plurality of successive, adjacent time intervals. Each unit provides an output data item in the time interval in which the unit performs the operation and provides a processed data item in the last of the successive, adjacent time intervals. The special function unit provides a special function computation for the output data item of a selected one of the scalar units, in the time interval in which the selected scalar unit performs the operation, so as to avoid a conflict in use among the scalar units. A vector processing unit includes an input data buffer, the scalar processor, and an output orthogonal converter.

SIMD processor with scalar arithmetic logic units

SIMD processor with scalar arithmetic logic units

SIMD processor with scalar arithmetic logic units

Owner:S3 GRAPHICS

Histogram generation with vector operations in SIMD and VLIW processor by consolidating LUTs storing parallel update incremented count values for vector data elements

InactiveUS7506135B1Program controlArchitecture with multiple processing unitsScalar processorSimd processor

The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.

Histogram generation with vector operations in SIMD and VLIW processor by consolidating LUTs storing parallel update incremented count values for vector data elements

Histogram generation with vector operations in SIMD and VLIW processor by consolidating LUTs storing parallel update incremented count values for vector data elements

Histogram generation with vector operations in SIMD and VLIW processor by consolidating LUTs storing parallel update incremented count values for vector data elements

Owner:MIMAR TIBET

Instruction issue control within a multi-threaded in-order superscalar processor

ActiveUS7707390B2Performance constrainedIncrease flexibilityGeneral purpose stored program computerMemory systemsScalar processorProgram Thread

A multi-threaded in-order superscalar processor 2 is described having a fetch stage 8 within which thread interleaving circuitry 36 interleaves instructions taken from different program threads to form an interleaved stream of instructions which is then decoded and subject to issue. Hint generation circuitry 62 within the fetch stage 8 adds hint data to the threads indicating that parallel issue of an associated instruction is permitted with one of more other instructions.

Instruction issue control within a multi-threaded in-order superscalar processor

Instruction issue control within a multi-threaded in-order superscalar processor

Instruction issue control within a multi-threaded in-order superscalar processor

Owner:ARM LTD

Superscalar processor having content addressable memory structures for determining dependencies

InactiveUS6862676B1Lower requirementRapid designMemory adressing/allocation/relocationGeneral purpose stored program computerScalar processorOrder processing

A superscalar processor having a content addressable memory structure that transmits a first and second output signal is presented. The superscalar processor performs out of order processing on an instruction set. From the first output signal, the dependencies between currently fetched instructions of the instruction set and previous in-flight instructions can be determined and used to generate a dependency matrix for all in-flight instructions. From the second output signal, the physical register addresses of the data required to execute an instruction, once the dependencies have been removed, may be determined.

Superscalar processor having content addressable memory structures for determining dependencies

Superscalar processor having content addressable memory structures for determining dependencies

Superscalar processor having content addressable memory structures for determining dependencies

Owner:ORACLE INT CORP

Method and system for parallel histogram calculation in a simd and vliw processor

InactiveUS20090276606A1Program control using stored programsGeneral purpose stored program computerScalar processorParallel computing

The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.

Method and system for parallel histogram calculation in a simd and vliw processor

Method and system for parallel histogram calculation in a simd and vliw processor

Method and system for parallel histogram calculation in a simd and vliw processor

Owner:MIMAR TIBET

Quantifying Completion Stalls Using Instruction Sampling

InactiveUS20090259830A1Error detection/correctionDigital computer detailsData processing systemScalar processor

A method, computer program product, and data processing system for collecting metrics regarding completion stalls in an out-of-order superscalar processor with branch prediction is disclosed. A preferred embodiment of the present invention selectively samples particular instructions (or classes of instructions). Each selected instruction, as it passes through the processor datapath, is marked (tagged) for monitoring by a performance monitoring unit. The progress of marked instructions is monitored by the performance monitoring unit, and various stall counters are triggered by the progress of the marked instructions and the instruction groups they form a part of. The stall counters count cycles to give an indication of when certain delays associated with particular instructions occur and how serious the delays are.

Quantifying Completion Stalls Using Instruction Sampling

Quantifying Completion Stalls Using Instruction Sampling

Quantifying Completion Stalls Using Instruction Sampling

Owner:IBM CORP

Data processing units

ActiveUS20130331954A1Computer controlSimulator controlScalar processorData processing

A data processing unit combines a scalar processor and a heterogeneous processor which includes a vector processing array. The vector processing array includes a plurality of vector processors which are operable in a single instruction multiple data configuration.

Data processing units

Data processing units

Data processing units

Owner:BLUWIRELESS TECH

Method and device for realizing instruction cache path selection in superscaler processor

ActiveCN102306092AImprove energy efficiencyReduce energy consumptionConcurrent instruction executionScalar processorParallel computing

The invention discloses a method and a device for realizing instruction cache path selection in a superscaler processor, wherein the method comprises the following steps of: judging a fetch mode at least according to an instruction fetch request, performing path prediction with a path history mode according to fetch mode attributed to sequence fetch scenes, and performing path prediction with a path prediction mode according to the fetch mode attributed to non-sequence fetch scenes. Therefore, the energy efficiency of the superscaler processor is integrally increased; and the overall energy consumption of the superscaler processor is reduced as a large amount of unnecessary path Tag comparisons and Data access are not needed and less extra resources are used.

Method and device for realizing instruction cache path selection in superscaler processor

Method and device for realizing instruction cache path selection in superscaler processor

Method and device for realizing instruction cache path selection in superscaler processor

Owner:BEIJING PKUNITY MICROSYST TECH

Processing prefix code in instruction queue storing fetched sets of plural instructions in superscalar processor

InactiveUS8402256B2Improve performanceIncrease powerInstruction analysisRuntime instruction translationInstruction unitScalar processor

The present invention is directed to realize efficient issue of a superscalar instruction in an instruction set including an instruction with a prefix. A circuit is employed which retrieves an instruction of each instruction code type other than a prefix on the basis of a determination result of decoders for determining an instruction code type, adds the immediately preceding instruction to the retrieved instruction, and outputs the resultant to instruction executing means. When an instruction of a target instruction code type is detected in a plurality of instruction units to be searched, the circuit outputs the detected instruction code and the immediately preceding instruction other than the target instruction code type as prefix code candidates. When an instruction of a target instruction code type cannot be detected at the rear end of the instruction units to be searched, the circuit outputs the instruction at the rear end as a prefix code candidate. When an instruction of a target instruction code type is detected at the head in the instruction code search, the circuit outputs the instruction code at the head.

Processing prefix code in instruction queue storing fetched sets of plural instructions in superscalar processor

Processing prefix code in instruction queue storing fetched sets of plural instructions in superscalar processor

Processing prefix code in instruction queue storing fetched sets of plural instructions in superscalar processor

Owner:RENESAS ELECTRONICS CORP

Vector processor

ActiveUS7543119B2General purpose stored program computerDigital storageProcessing InstructionScalar processor

A vector processing system provides high performance vector processing using a System-On-a-Chip (SOC) implementation technique. One or more scalar processors (or cores) operate in conjunction with a vector processor, and the processors collectively share access to a plurality of memory interfaces coupled to Dynamic Random Access read / write Memories (DRAMs). In typical embodiments the vector processor operates as a slave to the scalar processors, executing computationally intensive Single Instruction Multiple Data (SIMD) codes in response to commands received from the scalar processors. The vector processor implements a vector processing Instruction Set Architecture (ISA) including machine state, instruction set, exception model, and memory model.

Vector processor

Vector processor

Vector processor

Owner:HESSEL RICHARD +1

Method for variable length opcode mapping in a VLIW processor

InactiveUS20110072238A1Reduce wasteSavings in program memoryConditional code generationProgram control using stored programsVariable-length codeScalar processor

The present invention provides a method for reducing program memory size required for a dual-issue processor with a scalar processor plus a SIMD vector processor. Coding the map of next group of instruction pairs in a no-operation (NOP) instruction of scalar and vector processor reduces the cases where one of the scalar or vector opcode being a NOP opcode. NOP for either scalar or vector processor defines the next 13 instructions as scalar-plus-vector, scalar-followed-by-scalar, or vector-followed-by-vector so that execution unit performs accordingly until next NOP or a branch instruction.

Method for variable length opcode mapping in a VLIW processor

Method for variable length opcode mapping in a VLIW processor

Method for variable length opcode mapping in a VLIW processor

Owner:MIMAR TIBET

Superscaler processor and method for efficiently recovering from misaligned data addresses

InactiveUS6289428B1Memory adressing/allocation/relocationDigital computer detailsScalar processorData segment

A superscalar processor and method are disclosed for efficiently recovering from misaligned data addresses. The processor includes a memory device partitioned into a plurality of addressable memory units. Each of the plurality of addressable memory units has a width of a first plurality of bytes. A determination is made regarding whether a data address included within a memory access instruction is misaligned. The data address is misaligned if it includes a first data segment located in a first addressable memory unit and a second data segment located in a second addressable memory unit where the first and second data segments are separated by an addressable memory unit boundary. In response to a determination that the data address is misaligned, a first internal instruction is executed which accesses the first memory unit and obtains the first data segment. A second internal instruction is executed which accesses the second memory unit and obtains the second data segment. The first and second data segments are merged together. All of the instructions executed by the processor are constrained by the memory boundary and do not access memory across the memory boundary.

Superscaler processor and method for efficiently recovering from misaligned data addresses

Superscaler processor and method for efficiently recovering from misaligned data addresses

Superscaler processor and method for efficiently recovering from misaligned data addresses

Owner:IBM CORP

Multi-pipe dispatch and execution of complex instructions in a superscalar processor

ActiveUS7085917B2Performance maximizationImprove performanceCooking-vessel materialsRuntime instruction translationScalar processorControl signal

In a computer system, a method and apparatus for dispatching and executing multi-cycle and complex instructions. The method results in maximum performance for such without impacting other areas in the processor such as decode, grouping or dispatch units. This invention allows multi-cycle and complex instructions to be dispatched to one port but executed in multiple execution pipes without cracking the instruction and without limiting it to a single execution pipe. Some control signals are generated in the dispatch unit and dispatched with the instruction to the Fixed Point Unit (FXU). The FXU logic then execute these instructions on the available FXU pipes. This method results in optimum performance with little or no other complications. The presented technique places the flexibility of how these instructions will be executed in the FXU, where the actual execution takes place, instead of in the instruction decode or dispatch units or cracking by the compiler.

Multi-pipe dispatch and execution of complex instructions in a superscalar processor

Multi-pipe dispatch and execution of complex instructions in a superscalar processor

Multi-pipe dispatch and execution of complex instructions in a superscalar processor

Owner:IBM CORP

Issue policy control within a multi-threaded in-order superscalar processor

ActiveUS20080282067A1Improve performanceDigital computer detailsMemory systemsScalar processorExecution unit

A multi-threaded in-order superscalar processor 2 includes an issue stage 12 including issue circuitry 22, 24 for selecting instructions to be issued to execution units 14, 16 in dependence upon a currently selected issue policy. A plurality of different issue policies are provided by associated different policy circuitry 28, 30, 32 and a selection between which of these instances of the policy circuitry 28, 30, 32 is active is made by policy selecting circuitry 34 in dependence upon detected dynamic behaviour of the processor 2.

Issue policy control within a multi-threaded in-order superscalar processor

Issue policy control within a multi-threaded in-order superscalar processor

Issue policy control within a multi-threaded in-order superscalar processor

Owner:ARM LTD

Macroscalar processor architecture

ActiveUS20080229076A1Resource allocationSoftware engineeringScalar processorParallel computing

A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.

Macroscalar processor architecture

Macroscalar processor architecture

Macroscalar processor architecture

Owner:APPLE INC

Pipelined instruction dispatch unit in a superscalar processor

InactiveUSRE38599E1Avoid complex processDigital computer detailsConcurrent instruction executionScalar processorScheduling instructions

A pipelined instruction dispatch or grouping circuit allows instruction dispatch decisions to be made over multiple processor cycles. In one embodiment, the grouping circuit performs resource allocation and data dependency checks on an instruction group, based on a state vector which includes representation of source and destination registers of instructions within said instruction group and corresponding state vectors for instruction groups of a number of preceding processor cycles.

Pipelined instruction dispatch unit in a superscalar processor

Pipelined instruction dispatch unit in a superscalar processor

Pipelined instruction dispatch unit in a superscalar processor

Owner:SUN MICROSYSTEMS INC

Data processor for improving storage instruction execution efficiency

InactiveCN102495724AReduce pauseImprove execution efficiencyProgram initiation/switchingConcurrent instruction executionScalar processorScheduling instructions

A data processor for improving the storage instruction execution efficiency comprises a register file, an instruction decoding unit, an instruction scheduling unit, a storage instruction queue and instruction execution units, wherein the instruction scheduling unit is used for completing feedforward of the address operand of a stored instruction and feedforward of all operands of other instructions according to related information of instruction operands, and transmitting the instructions completed by operand feedforward to the corresponding instruction execution unit; the storage instruction queue is used for receiving storage instructions from the instruction decoding unit, storing write-back data and related information of the stored instruction, monitoring outlet data of all the execution units and completing feedforward of the write-back data of the stored instruction according to the related information of the stored instruction data operand; and the instruction execution units are used for receiving instructions transmitted by the instruction scheduling unit and are divided into different execution units according to the instruction types. The data processor provided by the invention effectively reduces breakdown of assembly lines due to genuine correlation of data write after read, improves the execution efficiency of storage instructions and has promoted performance.

Data processor for improving storage instruction execution efficiency

Data processor for improving storage instruction execution efficiency

Data processor for improving storage instruction execution efficiency

Owner:C SKY MICROSYST CO LTD

Popular searches

Least recently frequently used Parallel port Electronic data processing Integrated circuit Digital signal processor Cache miss Processor element Wait state External memory interface Associative cache

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com