Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

37 results about "Jazelle" patented technology

Jazelle DBX (Direct Bytecode eXecution) is an extension that allows some ARM processors to execute Java bytecode in hardware as a third execution state alongside the existing ARM and Thumb modes. Jazelle functionality was specified in the ARMv5TEJ architecture and the first processor with Jazelle technology was the ARM926EJ-S. Jazelle is denoted by a "J" appended to the CPU name, except for post-v5 cores where it is required (albeit only in trivial form) for architecture conformance.

Procedure level software and hardware collaborative design automatized development method

The invention provides a procedure level software and hardware collaborative design automatized development method, which is characterized in that the method comprises the following steps: step 1, using high level languages to complete the system function description which comprises the transfer of the software and hardware collaborative functions; step 2, dynamically dividing the software and hardware functions; step 3, linking and executing the step; and step 4, judging and ending the step (judging whether the execution of all functions is completed, ending the step if the execution of all functions is completed, and otherwise, returning parameters used for dividing to the second step to enter a next circulation). The invention uses the procedure level software and hardware uniform programming model for shielding the difference realized by bottom layer hardware to realize the goal of transparent effect of reconstruction devices on program users. The programming model encapsulates the hardware accelerator into C Language functions for bringing convenience for the programming by users, and in addition, the dynamic software and hardware division during the operation is supported, so the division is transparent to programmers, and the utilization rate of reconstruction resources is improved.
Owner:HUNAN UNIV

Domestication heterogeneous computing acceleration platform

The invention relates to a domestication heterogeneous computing acceleration platform which is technically characterized by comprising an accelerator hardware platform, an operating system layer, a GPU accelerator driving layer, an FPGA accelerator driving layer, heterogeneous acceleration stack middleware, an application program and an acceleration library. The accelerator hardware platform is responsible for computing storage resource allocation and scheduling; the GPU accelerator driving layer and the FPGA accelerator driving layer provide bottom hardware internal resource management interfaces for heterogeneous platform middleware to call; the acceleration stack middleware maps heterogeneous system computing and storage resources to an operating system user space, and provides a standardized calling interface for a top application program; the acceleration library provides basic operation parallelization and bottom layer optimization, when an application program is executed, the host submits a calculation kernel and an execution instruction, and calculation is executed in a calculation unit on the equipment. The heterogeneous many-core acceleration stack and the heterogeneousparallel computing framework are constructed, the difference between heterogeneous system platforms is hidden, and domestication of heterogeneous acceleration software and hardware platforms is achieved.
Owner:TIANJIN QISUO PRECISION ELECTROMECHANICAL TECH

Automatic transplanting and optimizing method for heterogeneous parallel program

The invention discloses an automatic transplanting and optimizing method for a heterogeneous parallel program, and belongs to a heterogeneous parallel program development technology. The invention aims to realize automatic transplantation of CPU parallel programs and improve the program performance while reducing the workload of developers, thereby solving the problems of parallel instruction conversion, data transmission management and optimization. The method is characterized in that a framework of a heterogeneous parallel program automatic transplanting system is constructed, and the heterogeneous parallel program automatic transplanting system is used for automatically translating an OpenMP CPU parallel program into an OpenMP Offloading heterogeneous parallel program; consistency stateconversion is formalized, on the premise that data consistency is guaranteed, transmission operation is optimized, and redundant data transmission is reduced; a runtime library is designed, wherein the runtime library is used for providing an automatic data transmission management and optimization function and maintaining the consistency state of each variable memory area; and a source-to-sourcetranslator is designed, wherein the translator is used for automatically converting a parallel instruction and automatically inserting a runtime API. The method can automatically identify the CPU parallel instruction and convert the CPU parallel instruction into the accelerator parallel instruction, so that the program performance is improved.
Owner:HARBIN INST OF TECH

Method and device for generating track record compression scheme of application program

The invention relates to a method for generating a track record compression scheme of an application program, wherein the application program is operated on a heterogeneous multi-core system structure which comprises a main processor and a plurality of accelerators. The method comprises the following steps: operating at least a part of the application program on the heterogeneous multi-core system structure; collecting and analyzing track records which are generated by a plurality of the accelerators and comprise event identifiers and track data; and generating the track record compression scheme of the application program according to an analyzing result. The method can be applied to the heterogeneous multi-core system structure, and not only can compress the totally same track records, but also can compress the partially same track records by using a track record mode similar to the track records. Time for transmitting the track records can be shortened by compressing the track records of high-frequency events so as to reduce the influence on the behavior of the application program. The invention also discloses a device for generating the track record compression scheme of the application program, a method and a device for compressing the track records of the application program, and the heterogeneous multi-core system structure.
Owner:IBM CORP

Floating-point unit of Java processor and control method thereof

The invention discloses a floating-point unit of a Java processor and a control method thereof, which meet the requirements of an embedded system on a high-accuracy operation, wherein the floating-point unit is used as a hardware accelerator and directly connected to the core of the Java processor, and an input port of the FPU (floating-point unit) is used for receiving a clock input, a starting signal, an operand A, an operand B, an operator and a rounding mode which are from the core of the Java processor. The FPU comprises operation units and a demultiplexer; the FPU selects the corresponding operation units according to the input operator, and the operation units realize various basic operations; the operation units are divided into an addition/subtraction operation unit, a multiplication operation unit and a division operation unit; and the demultiplexer selects a register of an operation result of the corresponding operation unit according to the input operator, and checks whether the operation and a to-be-output operation result trigger the abnormity of a floating point with a standard definition, if so, the demultiplexer outputs a corresponding abnormity signal; otherwise, an output port of the FPU sends a completion signal and the operation result in the register to the core of the Java processor.
Owner:SYSU HUADU IND SCI & TECH INST

Convolution operation hardware accelerator and data processing method

The invention discloses a convolution operation hardware accelerator which is characterized in that a feature graph recombination module segments each row of input feature data of an input feature graph, and each segment of data is M at most; the feature map cache write control module generates a feature map cache unit sequence number and a cache write address to be written into each piece of input feature data, the sequence number and the cache write address are written into the feature map cache module, and at most M pieces of input feature data are written into M feature map cache units in parallel in the same clock period; the feature map cache read control module reads at most M pieces of input feature data in parallel in a single clock period in one convolution operation; the convolution kernel cache module writes a convolution kernel of a single output channel into M convolution kernel cache units according to the fact that M pieces of convolution kernel data are taken as a group; and the calculation module performs corresponding convolution operation on the convolution kernel data and the input feature data. According to the invention, the hardware accelerator can support the convolution operation of different size parameters, and the calculation efficiency of the convolution operation is improved.
Owner:HANGZHOU FEISHU TECH CO LTD

Matrix Multiplication Acceleration Method for Heterogeneous Fusion Architecture

The invention discloses a matrix multiplication acceleration method oriented to a heterogeneous fusion system structure, and aims to design a universal matrix multiplication acceleration method oriented to the heterogeneous fusion system structure for different many-core accelerator target system structures and improve the use efficiency of a heterogeneous system. According to the technical scheme, firstly, block matrix multiplied versions facing a heterogeneous fusion system structure are designed, the block matrix multiplied versions comprise vcpu, vgpu, vmic, vscif, vcoi and vtarget, and then the heterogeneous fusion multi-version matrix multiplied versions are integrated and packaged to generate a library file HU-xgemm of the heterogeneous fusion version; finally, an accelerator in anHU-xgemm adaptive heterogeneous fusion system structure is adopted. According to the invention, different target accelerators and processors can be self-adapted; matrix multiplication can be adaptively carried out according to different heterogeneous fusion system structures, matrix multiplication is carried out according to topological structures of CPUs or accelerators in the different heterogeneous fusion system structures, FMA parallel computing is carried out, the matrix multiplication speed is increased, and the use efficiency of a heterogeneous system is improved.
Owner:NAT UNIV OF DEFENSE TECH

Target detection full-process acceleration method and system based on Vitis stack

The invention discloses a target detection full-process acceleration method and system based on a Vitis stack, and the method comprises the steps: firstly carrying out the fine adjustment of a network, then carrying out the quantification processing of a model through a vitis quantizer, and carrying out the compiling of the model through a vitis compiler, achieving the conversion of an original network model, and reducing the data size; data transformation storage and data type fixed-point processing are carried out on the host program data, so that the data parallelism degree is improved; then, task type parallel and data optimization instructions are performed on the whole process, and data operation can be accelerated; next, a hardware accelerator and a system file are respectively created through vivado and petalinux, a hardware platform is formed through vitis integration, and interactive transmission is carried out on a ps end and a pl end through an AXI bus; and finally, the optimization design of the previous steps is integrated to form full-process acceleration. According to the design, the throughput rate of deploying a neural network algorithm in an embedded field edge device can be increased, and a certain acceleration effect is achieved in the field of target detection.
Owner:SUN YAT SEN UNIV +2
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products