Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

32results about How to "Increase Computational Parallelism" patented technology

Neural network accelerator suitable for edge equipment and neural network acceleration calculation method

The invention discloses a neural network accelerator suitable for edge equipment and a neural network acceleration calculation method, and relates to the technical field of neural networks. The network accelerator comprises a configuration unit, a data buffer unit, a processing matrix component (PMs) and a post-processing unit, and a main controller writes feature parameters of different types ofnetwork layers into a register of the configuration unit to control the mapping of different network layer operation logics to the processing matrix hardware, so as to realize the multiplexing of theprocessing matrix component, i.e., the operation acceleration of different types of network layers in the neural network is realized by using one hardware circuit without additional hardware resources; and the different types of network layers comprise a standard convolution layer and a pooling network layer. The multiplexing accelerator provided by the invention not only ensures the realization of the same function, but also has the advantages of less hardware resource consumption, higher hardware multiplexing rate, lower power consumption, high concurrency, high multiplexing characteristic and strong structural expansibility.
Owner:上海赛昉科技有限公司

Parallel intra-frame prediction method of 8*8 sub-macroblocks in H.265/HEVC

The invention discloses a parallel intra-frame prediction method of 8*8 sub-macroblocks in H.265/HEVC. The method comprises the following steps: unifying an intra-frame prediction formula form, establishing a coefficient table and a reference position table and a specific execution step of parallel intra-frame prediction, wherein the unification of the intra-frame prediction formula form and the establishment of the coefficient tale and the reference position are formulated according to the characteristics of the CUDA and an intra-frame prediction computational formula, so that the prediction of 64 to-be-predicted pixels and the corresponding 35 prediction modes in the 8*8 sub-macroblocks through a unified prediction formula is more benefited, the requirement of single-instruction multi-data stream of CUDA multi-thread is satisfied, the intra-frame prediction of the fine granularity parallel in the sub-macroblocks is realized, and a large number of branch statements influencing the parallel algorithm performance are eliminated. The pixel level parallel is realized in the intra-frame prediction process, the many-core resource in the GPU can be effectively used for accelerating the intra-frame prediction process, and the encoding time is shortened.
Owner:HUAZHONG UNIV OF SCI & TECH

Model-parallel full-connected layer data exchange method and system for deep neural network

The invention discloses a model-parallel full-connected layer data exchange method and system for a deep neural network. The model-parallel full-connected layer data exchange method for a deep neural network includes the steps: uniformly dividing the full-connected layer of the deep neural network to N training units according to the number of nerve cells, and forming a network model being parallel with the full-connected layer model in the deep neural network; during the forward propagation process of the full-connected layer, utilizing a half-stop waiting forward propagation method to employ the processing modes of partial arrival, partial calculation, overall output and overall propagation on the input data of the front layer; during the backward propagation process of the full-connected layer, utilizing a quantified half-stop waiting backward propagation method to employing the processing modes of quantified arrival, quantified calculation and quantified propagation on the residual error data of the back layer; and after completing the primary forward and backward propagation, according to the solved weight gradient and threshold gradient, parallelly updating the weight data and threshold data of each layer. The model-parallel full-connected layer data exchange method for a deep neural network can overlap data communication and data calculation of the full-connected layer, and can accelerate convergence of the acceleration model on the premise of guaranteeing the correct rate.
Owner:HUAZHONG UNIV OF SCI & TECH

Processing method and device for multiplication and accumulation operation

The invention discloses a processing method and device for multiplication and accumulation operation, used for solving the problems of low data processing efficiency and high power consumption of a computer in the prior art. The method comprises the following steps of distributing a register identifier to each read multiplication and accumulation instruction, after processing each multiplication and accumulation instruction to obtain an add operand, taking the add operand and the register identifier distributed to the multiplication and accumulation instruction as binary groups to cache, reading one binary group as a reference binary group, taking the add operand included in the reference binary group as the first add operand, reading an associated binary group, taking the add operand included in the associated binary group as the second add operand, or, reading data in a register corresponding to the register identifier included in the reference binary group as the second add operand, generating an add calculation result based on the first add operand and the second add operand, and storing the add calculation result in the source of the second add operand. Thereby, the calculation parallelism, the data throughout and the data processing efficiency are increased; and the power consumption of the computer is reduced.
Owner:HONOR DEVICE CO LTD

FPGA hardware implementation method and device based on Bayesian resampling particle filtering and a target tracking method

The invention discloses an FPGA hardware implementation method and device based on Bayesian resampling particle filtering and a target tracking method. The FPGA hardware implementation method comprises the steps that a particle sampling unit reads old particles from a particle cache block, receives random numbers from a random number generator, and samples and updates the read old particles in parallel; a weight updating unit reads the observed value, performs weight calculation on the updated particles in parallel, and stores the generated weight into a weight cache block; the Bayesian resampling unit adopts a Bayesian resampling method, performs resampling in parallel according to all weight values in the weight cache blocks, and stores index output values back to the corresponding index cache blocks; the pseudo-random arrangement generator reads the address of a new particle from the index cache block and randomly allocates the new particle to each particle cache block to achieve exchange in particle parallel computing; and the steps are circularly executed until iteration of all the time steps is completed, and state estimation of the system is completed. According to the invention, the calculation speed of the particle filtering system can be improved.
Owner:HUNAN NORMAL UNIVERSITY

In-memory computing circuit based on local multiplication-global addition structure, memory and equipment

The invention provides an in-memory computing circuit based on a local multiplication-global addition structure. The in-memory computing circuit comprises a plurality of sub-computing arrays, a plurality of word lines, a plurality of bit lines, a plurality of complementary bit lines and a plurality of source lines, the sub-calculation arrays in each row are connected with a common word line, and the sub-calculation arrays in each column are connected with a common bit line, a common complementary bit line and a common source line; the calculation unit in each sub calculation array comprises a first transistor, a first memory, a second transistor and a second memory; the grid of the transistor is connected with the word line, and the source is connected with the source line; the drain of the first transistor is connected with a first memory, and the other end of the first memory is connected with a bit line; the drain of the second transistor is connected with the second memory, and the other end of the second memory is connected with the complementary bit line; the grid electrode of the third transistor is connected with the source line, the source electrode is grounded, and the drain electrode is connected with the source electrode of the fourth transistor; the grid of the fourth transistor is connected with the input line, and the drain is connected with the calculation line through the switch. The invention also provides a memory and electronic equipment.
Owner:INST OF MICROELECTRONICS CHINESE ACAD OF SCI

N level sub-pixel search method based on whole pixel searching result

The invention belongs to the technical field of the digital video code, particularly to a n-level sub-pixel searching method based on the integer-pixel searching result and a device thereof, characterized by selecting the coordinate position of the top left corner pixel besting matching with the integer-pixel in the reference picture as the center, then performing the n-level sub-pixel searching with variable complexity, obtaining the block matching error function value of the corresponding sub-pixel position point and the integer-pixel position point, comparing the block matching error function value of the corresponding sub-pixel position point and the integer-pixel position point, obtaining the optimum coordinate position at the top left corner of the block corresponding with the blockmatching error function value, wherein n is not less than 2 in the n-level sub-pixel searching. According to the invention, the sub-pixel searching with high level is independent of the sub-pixel searching result of the lower level, thereby increasing the computing parallelism, saving the coding time, effectively advancing the computing parallelism between different sub-pixel levels when the hardware implement; while the device based on the method can make full use of the parallelism characteristic between the sub-pixels with different levels to increase the processing speed of the hardware.
Owner:ZHEJIANG UNIV

Packaging box information code optical identification and comparison method and system

The invention provides a packaging box information code optical identification and comparison method and system. A photoelectric switch trigger starts an optical code reader; the optical code reader reads and identifies an information code; a cloud storage unit module performs cloud storage; a central control unit module decodes and compares the information, constructs a convolutional neural network model, divides the convolutional neural network model into two different data sets for training and self-adaption, and compares subsequently input information codes on a packaging box after training; if the information of the compared information code is less than the actual information type, a signal is sent to an audible and visual alarm; and the audible and visual alarm gives an alarm prompt. After the convolutional neural network model with the Softmax layer is constructed, data self-adaption and separation of the two training sets are performed, so that the calculation parallelism is improved, and the recognition precision is improved on the premise of fewer parameters and lower calculation amount; and a mode of post-processing the classification probability of the convolutional neural network is adopted, so that the recognition precision of the model is effectively improved.
Owner:杭州胜铭纸业有限公司

Bistatic forward-looking SAR image geometric correction method and device

The invention provides a bistatic forward-looking SAR image geometric correction method and device based on a DSP and an FPGA. The method comprises the following steps: the DSP sends parameter data tothe FPGA; the FPGA is provided with a two-dimensional virtual matrix on the ground plane, and the FPGA is provided with k oblique ground pixel point corresponding relation calculation modules; k unprocessed pixel points and coordinates are extracted from the two-dimensional virtual matrix; the acquired coordinates of the unprocessed pixel points are simultaneously input into the oblique ground pixel point corresponding relation calculation modules, wherein each oblique ground pixel point corresponding relation calculation module processes the coordinate of one unprocessed pixel point; the calculation results of the oblique ground pixel point corresponding relation calculation modules of the FPGA are combined into an oblique ground projection conversion matrix; and the echo of the DSP completes range direction and azimuth direction processing, and after the image of the imaging oblique plane is obtained, the image of the imaging oblique plane is corrected into the image of the ground plane in combination with the oblique projection conversion matrix.
Owner:成都汇蓉国科微系统技术有限公司

Deep neural network model parallel fully connected layer data exchange method and system

The invention discloses a deep neural network model parallel fully connected layer data exchange method and system. The fully connected layer of the deep neural network is evenly divided into N training units according to the number of neurons to form a deep neural network. A network model in which the fully connected layer model is parallel; in the forward propagation process of the fully connected layer, the forward propagation method such as half-stop is used to process the input data of the front layer by partial arrival, partial calculation, overall output and overall propagation method; in the backward propagation process of the fully connected layer, the residual data of the rear layer is processed by using the backward propagation method such as stop and stop, and the processing methods of quantitative achievement, quantitative calculation and quantitative propagation are adopted; in a forward and backward propagation After completion, the weight data and threshold data of each layer are updated in parallel according to the obtained weight gradient and threshold gradient. It can overlap the data communication and data calculation of the fully connected layer, and accelerate the convergence of the model under the premise of ensuring the correct rate.
Owner:HUAZHONG UNIV OF SCI & TECH

A processing method and device for multiplication and accumulation operations

The invention discloses a processing method and device for multiplication and accumulation operation, used for solving the problems of low data processing efficiency and high power consumption of a computer in the prior art. The method comprises the following steps of distributing a register identifier to each read multiplication and accumulation instruction, after processing each multiplication and accumulation instruction to obtain an add operand, taking the add operand and the register identifier distributed to the multiplication and accumulation instruction as binary groups to cache, reading one binary group as a reference binary group, taking the add operand included in the reference binary group as the first add operand, reading an associated binary group, taking the add operand included in the associated binary group as the second add operand, or, reading data in a register corresponding to the register identifier included in the reference binary group as the second add operand, generating an add calculation result based on the first add operand and the second add operand, and storing the add calculation result in the source of the second add operand. Thereby, the calculation parallelism, the data throughout and the data processing efficiency are increased; and the power consumption of the computer is reduced.
Owner:HONOR DEVICE CO LTD

Intelligent multi-thread clustering method and device and computer readable storage medium

The invention relates to an artificial intelligence technology, and discloses an intelligent multi-thread clustering method, which comprises the following steps of: receiving n data sample sets and aclustering number K input by a user, randomly determining K cluster centers according to the clustering number K, randomly dividing the n data sample sets into K blocks, and inputting the K blocks into K data modules; reading the sample sets in the K data modules by K threads, calculating the loss values of the K cluster centers and the n data sample sets, and judging the size relationship betweenthe loss values and a preset threshold value; and when the loss value is greater than the preset threshold value, re-determining the K cluster centers, re-calculating the loss value and judging the size relationship with the preset threshold value, and when the loss value is less than the preset threshold value, outputting the K cluster centers to complete a clustering result. The invention further provides an intelligent multi-thread clustering device and a computer readable storage medium. According to the invention, an accurate intelligent multi-thread clustering function can be realized.
Owner:CHINA PING AN PROPERTY INSURANCE CO LTD

A kind of fpga hardware implementation method, device and target tracking method of particle filter based on Bayesian resampling

The invention discloses an FPGA hardware implementation method, device and target tracking method of particle filtering based on Bayesian resampling. The FPGA implementation method is as follows: the particle sampling unit reads old particles from the particle cache block, Receive random numbers, and sample and update the read old particles in parallel; the weight update unit reads the observed values, performs weight calculations on the updated particles in parallel, and stores the generated weights in the weight cache block; the Bayesian resampling unit uses Bayesian resampling method, resampling is performed in parallel according to all weight values ​​in the weight cache block, and the index output value is stored back to the corresponding index cache block; the pseudo-random permutation generator reads the address of new particles from the index cache block , randomly assign new particles to each particle cache block to realize the exchange of particles in parallel computing; execute the above steps in a loop until all time steps are iteratively completed to complete the state estimation of the system. The invention can improve the calculation speed of the particle filter system.
Owner:HUNAN NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products