Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

89results about How to "Reduce the number of memory accesses" patented technology

Matrix convolution calculation method, interface, coprocessor and system based on RISC-V architecture

The invention discloses a set based on RISC-. According to the method and system complete mechanism of the instruction, the interface and the coprocessor for matrix convolution calculation of the V instruction set architecture, traditional matrix convolution calculation is efficiently achieved in a software and hardware combined mode, and RISC-is utilized. Extensibility of V instruction sets, a small number of instructions and a special convolution calculation unit (namely a coprocessor) are designed; the memory access times and the execution period of a matrix convolution calculation instruction are reduced, the complexity of application layer software calculation is reduced, the efficiency of large matrix convolution calculation is improved, the calculation speed of matrix convolution isincreased, flexible calling of upper-layer developers is facilitated, and the coding design is simplified. Meanwhile, RISC-is utilized. The processor designed by the V instruction set also has greatadvantages in power consumption, size and flexibility compared with ARM, X86 and other architectures, can adapt to different application scenes, and has a wide prospect in the field of artificial intelligence.
Owner:NANJING HUAJIE IMI TECH CO LTD

Table establishing and lookup method applied to network processor

The invention provides a table establishing and lookup method applied to a network processor. According to the method, different types of hash tables are established and adopted, and the tables are unrelated and subjected to independent two-stage lookup. The method comprises the following steps: establishing different types of hash tables according to the size of the entry of the tables to be established; allocating different memory spaces to each table; assigning the size of the tables as well as first addresses; obtaining a search key, the type of a lookup table, and the first address of the table according to information extracted from a message during looking up; performing twice hash conversion on the key at the same time, wherein in the primary conversion process, the key is converted into an offset address to determine to determine an index value of the key in an index table, and in the secondary conversion process, the key is converted into a label to distinguish conflict items; reading content from a result table according to the index value to realize matching, so as to obtain a searched result. The method effectively reduces the access number of a memory, and further improves the look-up speed of the network processor and the resource utilization rate of the memory.
Owner:NO 32 RES INST OF CHINA ELECTRONICS TECH GRP

Algorithm parallel processing method and system based on heterogeneous many-core processor

The invention relates to an algorithm parallel processing method and system based on a heterogeneous many-core processor, and the method comprises the steps: taking a code segment with large operationtime consumption in a serial program as a parallel computing object, carrying out the task division according to the characteristics, determining the task division of a master core and a slave core array, handing over the time-consuming computing to the slave core array for execution, by each slave core, actively acquiring a task and data used for calculation from the main memory and returns a calculation result to the main core, by the main core, updating the main memory data in an asynchronous serial mode so as to avoid data read-write errors caused by data dependence, meanwhile, for the problem of time consumption of master-slave core communication, packaging a single data item in a structural body to realize data packaging, and setting a data master memory address 256B pair boundary of a master core to realize that the single data copy granularity is not less than 256B, so that the bandwidth of a single core group is utilized to the maximum extent, and the data transmission performance is optimized, consumption hiding of communication time is realized by using a double-buffer mechanism, and the parallel efficiency is improved.
Owner:OCEAN UNIV OF CHINA +1

Internal memory copying accelerating method and device facing multi-core microprocessor

The invention discloses an internal memory copying accelerating method and a device facing a multi-core microprocessor. The method comprises the following steps: an internal memory copying instruction and an MPI (Multi Point Interface) communication accelerating module are added in a microprocessor instruction in a concentration manner to identify internal memory copying request types which are obtained by decoding; general internal memory copying requests are issued to an internal memory copying unit; MPI group communication requests or MPI point-to-point communication requests are issued to the MPI communication accelerating module; the MPI communication accelerating module merges and executes associated internal memory copying requests to improve internal memory copying performance and execution efficiency; and the device comprises a decoding unit, the internal memory copying unit, associated detecting parts, and the MPI communication accelerating module for executing the internal memory copying requests which constitute MPI group communication or MPI point-to-point communication. The method and the device have the advantages of high efficiency of internal memory copying, good performance of multi-core optimization, low hardware design complexity, good compatibility, low power consumption and simplicity in hardware realization.
Owner:NAT UNIV OF DEFENSE TECH

Multiport register file circuit

The invention provides a multiport register file circuit comprising a write address decoder circuit, a read address decoder circuit, a first memory array, a second memory array, a first input data buffer circuit, a third input data buffer circuit, a first sense amplifier array, a third sense amplifier array, a second input data buffer circuit, a fourth input data buffer circuit, a second sense amplifier array and a fourth sense amplifier array, wherein the first memory array and the second memory array are respectively connected with the write address decoder circuit and the read address decoder circuit, the first input data buffer circuit and the third input data buffer circuit are mutually reversed and are connected with the first memory array, the first sense amplifier array and the third sense amplifier array are connected with the first memory array, the second input data buffer circuit and the fourth input data buffer circuit are mutually reversed and are connected with the second memory array, and the second sense amplifier array and the fourth sense amplifier array are connected with the second memory array. The multiport register file circuit can supply 17 read data ports and 9 write data ports at the same time, and each port has 32-bit data signals, thereby the multiport register file circuit is capable of being applied into a digital signal processor of a very long instruction word structure.
Owner:TSINGHUA UNIV

Byte code buffer device for improving instruction fetch bandwidth of Java processor and using method thereof

The invention relates to a byte code buffer device for improving the instruction fetch bandwidth of a Java processor and a using method thereof. In the invention, a byte code register, a multi-path selection module and a byte code buffer are sequentially connected; the input end of the byte code register is connected with an instruction memory, and the output end of the byte code buffer is connected with a decoding section of the Java processor; the input end of a control module is connected with the decoding section of the Java processor, and the output end of the control module is respectively connected with the byte code register, the multi-path selection module and the byte code buffer; and the byte code register has 32 bits, the byte code buffer has 64 bits, and high 4-bit bytes of the byte code buffer are connected with the decoding section of the Java processor. When the available space of the byte code buffer is not less than 4 bytes, the byte code buffer device of the invention reads 4 bytes from the register and transfers the 4 bytes to the correct position of the buffer through the multi-path selection module to enable the byte code to be executed to be always in high bytes completely, thereby reducing the access and storage times and improving the instruction fetch bandwidth.
Owner:JIANGNAN UNIV

Task scheduling method and system based on task stealing

The invention discloses a task scheduling method and system based on task stealing. The method comprises the steps that a task dependence graph is constructed, and a dependence point serves as a callback function to be registered into a callback container of a depended point; a lock-free double-end queue is allocated to threads in a thread pool respectively and is emptied, and root nodes are put at the bottoms of the lock-free double-end queues of the threads in a polling mode; if the lock-free double-end queues of the threads are not empty, the nodes are taken out of the bottoms of the lock-free double-end queues and are executed; if the lock-free double-end queues of the threads are empty, the nodes are stolen from the tops of the lock-free double-end queues of other threads, the stolen nodes are pressed into the bottoms of the lock-free double-end queues of the threads and are taken out for execution; after execution of all node tasks is completed, the in-degrees of the nodes in the task dependence graph are recovered to original values, and blockage to a main thread is ended. The task scheduling method and the system are oriented to large task-level parallel application programs, and the performance of traditional task-level parallel application programs can be effectively improved.
Owner:HUAZHONG UNIV OF SCI & TECH

FPGA graph processing acceleration method and system based on OpenCL

The invention discloses an FPGA graph processing acceleration method and system based on OpenCL, and belongs to the field of big data processing. The method comprises the following steps: generating acomplete control data flow diagram CDFG according to an intermediate code IR obtained by disassembling; partitioning the complete CDFG graph again according to Load and Store instructions to obtain new CDFG instruction blocks, and determining a parallel mode between the CDFG instruction blocks; analyzing Load and Store instructions in all the new CDFG instruction blocks, and determining a division mode of the BRAM on the FPGA chip; and reorganizing an on-chip memory of the FPGA by adopting a BRAM division mode, translating all new CDFG instruction blocks into corresponding hardware description languages according to a parallel mode among the instruction blocks, compiling and generating a binary file capable of running on the FPGA, and burning the binary file to the FPGA for running. By adopting a pipeline technology and readjusting the instruction in the instruction block, the memory access frequency is reduced, and the memory access delay is reduced; on-chip storage partitioning is adopted, writing conflicts of different assembly lines on the same memory block are reduced, and therefore the system efficiency is improved.
Owner:HUAZHONG UNIV OF SCI & TECH

Near data stream computing acceleration array based on RISC-V

The invention provides a near data stream computing acceleration array based on RISC-V. The near data stream computing acceleration array comprises an RSIC-V core and an acceleration array which is arranged around the RSIC-V core and is composed of a plurality of coprocessors. Each coprocessor comprises an NOC routing control node, an RAM block and a multiply-add particle. Wherein the RAM block isused for realizing caching of to-be-calculated data, the multiply-add particle is used for realizing multiply-accumulate calculation, and the NOC routing control node realizes interconnection with other adjacent coprocessors on one hand, and is also connected with the data RAM block and the multiply-add particle on the other hand. According to the method, to-be-calculated data is dispersedly stored in a plurality of ram blocks, and multiply-add calculation operators are placed as close as possible to rams. Adjacent coprocessors are interconnected by adopting an on-chip network structure, andthe relationship between a producer and a consumer is realized in a calculation process. Therefore, one calculation process can be converted into a process that a data stream flows between the coprocessor acceleration arrays for calculation after being split and mapped.
Owner:TIANJIN CHIP SEA INNOVATION TECH CO LTD +1

Cutting method based on Unity 3D model

A cutting method based on a Unity 3D model comprises the following steps: S1, obtaining a cutting surface of the model, wherein the model comprises vertexes and triangular surfaces formed by every three vertexes; S2, obtaining a triangular surface intersecting with the cutting surface in the model; S3, emitting rays to the other two vertexes from the vertex on one side of the intersecting triangular surface, obtaining an intersection point with the cutting surface, constructing first-class new triangular surfaces located on the two sides based on the intersection point, and generating new vertex information and first-class new triangular surface information through all the intersection points; S4, reordering the new vertex information to form a closed polygon on the profile, and generatinga second type of new triangular surface filling profile; and S5, cloning the original model to generate a new model, the vertex information on the two sides of the cutting surface covering the original model and the new model respectively to generate independent sub-models on the two sides of the cutting surface, and moving the sub-models to realize a cutting separation effect. Flexible and freecutting and separating effects can be achieved, the cutting experience is improved, the model cutting operation is simplified, and the cutting process performance is improved.
Owner:GUANGDONG VTRON TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products