Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

50results about How to "Lower memory access latency" patented technology

Mobile terminal real-time rendering system and method based on cloud platform

The invention discloses a mobile terminal real-time rendering system and method based on a cloud platform, and the method comprises the steps: receiving viewpoint information and interaction information transmitted by a mobile terminal, inquiring and reading a model and scene file, and obtaining three-dimensional scene data; dividing the three-dimensional scene data to obtain three-dimensional scene model group data according to the type of model groups of a three-dimensional scene; storing the three-dimensional scene model group data and carrying out the automatic adjustment of storage positions according to different data demands in different three-dimensional scenes; extracting the three-dimensional scene model group data, building and managing an MIC/GPU rendering task of a three-dimensional scene image, finally obtaining rendering result data of the three-dimensional scene image, and enabling the obtained rendering result data of the three-dimensional scene image to be compressed and then transmitted to the mobile terminal; employing a dynamic load strategy to carry out the deployment and management of the MIC/GPU rendering task, and guaranteeing the load balance of a cloud server. The method consumes the calculation capability and storage space of a mobile client to the minimum degree.
Owner:SHANDONG UNIV

Sparse tensor canonical decomposition method based on data division and calculation distribution

The invention relates to a sparse tensor canonical decomposition method based on data division and task allocation. The sparse tensor canonical decomposition method comprises the following steps: initially, performing multi-stage division and task allocation on a plurality of processing cores on a core group according to the many-core characteristics of an SW processor; initially, performing multi-stage segmentation processing on sparse tensor data; designing a communication strategy aiming at sparse tensor canonical decomposition by utilizing the register communication characteristics of the SW processor SW26010; aiming at the common performance bottleneck of different sparse tensor canonical decomposition methods, namely different requirements (whether tensor elements need to be randomly extracted for calculation) of matrix tensor multiplied by Khatri-Rao product (MTTKRP for short) during specific operation, different calculation schemes of the MTTKRP process are designed by utilizing the characteristics of a SW processor. According to the method, the characteristics of the SW system structure are fully excavated, the calculation requirements of sparse tensor decomposition are fully considered, multiple sparse tensor canonical decomposition calculation methods can be completed on the SW system structure in parallel and efficiently, and dynamic load balance is guaranteed to the maximum extent.
Owner:BEIHANG UNIV

Correction of incorrect cache accesses

The application describes a data processor operable to process data, and comprising: a cache in which a storage location of a data item within said cache is identified by an address, said cache comprising a plurality of storage locations and said data processor comprising a cache directory operable to store a physical address indicator for each storage location comprising stored data; a hash value generator operable to generate a generated hash value from at least some of said bits of said address said generated hash value having fewer bits than said address; a buffer operable to store a plurality of hash values relating to said plurality of storage locations within said cache; wherein in response to a request to access said data item said data processor is operable to compare said generated hash value with at least some of said plurality of hash values stored within said buffer and in response to a match to indicate a indicated storage location of said data item; and said data processor is operable to access one of said physical address indicators stored within said cache directory corresponding to said indicated storage location and in response to said accessed physical address indicator not indicating said address said data processor is operable to invalidate said indicated storage location within said cache.
Owner:ARM LTD +1

AMBA interface circuit

InactiveCN101710310AReduce latency and memory access latencySave resourcesElectric digital data processingEmbedded systemNetwork on
The invention relates to an AMBA interface circuit which is characterized in that 3 FIFOs are arranged in a Master interface circuit, wherein the Writer Data FIFO and the Writer Address FIFO are used for receiving the data and the address from the transmission of master equipment; if the master equipment does not obtain the right to use the bus temporarily, the data or the address can be first written into the Writer Data FIFO or the Writer Address FIFO, and the data or the address can be transmitted after the master equipment obtains the right to use the bus; the Read Data FIFO is used for sending data to the master equipment; when the master equipment is busy, the data from the transmission of a Slave equipment can be stored temporarily in the Read Data FIFO, then the bus can be release, and the data can be transmitted when the master equipment can receive the data. Compared with the prior art, the invention has the advantages that firstly, because the FIFOs are arranged in the Master interface circuit, the running of the master equipment and the slave equipment and the transmission of the data or the address can be made concurrent, and the bus waiting time and the access-memory delay can be can be shortened, secondly, because the FIFOs are arranged in the Master interface circuit, the resource can be saved in the process of the transmitting the data or the address by the master equipment and the slave equipment, and thirdly, the loss of the data can be avoided when the Master interface circuit is used for transmitting the network on the chip.
Owner:EAST CHINA INST OF OPTOELECTRONICS INTEGRATEDDEVICE

Correction of incorrect cache accesses

The application describes a data processor operable to process data, and comprising: a cache in which a storage location of a data item within said cache is identified by an address, said cache comprising a plurality of storage locations and said data processor comprising a cache directory operable to store a physical address indicator for each storage location comprising stored data; a hash value generator operable to generate a generated hash value from at least some of said bits of said address said generated hash value having fewer bits than said address; a buffer operable to store a plurality of hash values relating to said plurality of storage locations within said cache; wherein in response to a request to access said data item said data processor is operable to compare said generated hash value with at least some of said plurality of hash values stored within said buffer and in response to a match to indicate a indicated storage location of said data item; and said data processor is operable to access one of said physical address indicators stored within said cache directory corresponding to said indicated storage location and in response to said accessed physical address indicator not indicating said address said data processor is operable to invalidate said indicated storage location within said cache.
Owner:ARM LTD +1

Streamlined convolution computing architecture design method and residual network acceleration system

The invention provides a streamlined convolution computing architecture design method and a residual network acceleration system. According to the method, a hardware acceleration architecture is divided into an on-chip buffer area, a convolution processing array and a point-by-point addition module; a main path of the hardware acceleration architecture is composed of three convolution processing arrays which are arranged in series, and two assembly line buffer areas are inserted among the three convolution processing arrays and used for achieving interlayer assembly lines of three layers of convolution of the main path. A fourth convolution processing array is set to be used for processing convolution layers, with the kernel size being 1 * 1, of the branches of the residual building blocks in parallel, a register in the fourth convolution processing array is configured, the working mode of the fourth convolution processing array is changed, the fourth convolution processing array can be used for calculating a residual network head convolution layer or a full connection layer, and when the branches of the residual building blocks are not convolved, the fourth convolution processing array is skipped out and convolution is not exected; and a point-by-point addition module is set to add corresponding output feature pixels element by element for the output feature of the main path of the residual building block and the output feature of the branch quick connection.
Owner:SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products