Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

80 results about "Data parallelism" patented technology

Data parallelism is parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like arrays and matrices by working on each element in parallel. It contrasts to task parallelism as another form of parallelism.

Intermediate representation method and device for neural network model calculation

The invention discloses a neural network model calculation-oriented intermediate representation method and device. The method comprises the following steps of S1, analyzing an input model file to obtain topological structure information of a neural network; s2, constructing a logic calculation graph; s21, deducing physical layout information of each operator in the logic calculation graph; s22, deriving the element attribute of each operator in the logic calculation graph; s23, deducing description information of an input and output logic tensor of each operator in the logic calculation graph; s3, constructing a physical calculation graph; s31, generating a physical calculation graph; according to the meta-attribute-based intermediate representation for neural network model calculation disclosed by the invention, data parallelism, model parallelism and pipeline parallelism are originally supported from an operator level. According to the neural network model calculation-oriented intermediate representation method and device disclosed by the invention, the calculation expressions are taken as basic units, the tensors are taken as flowing data in the calculation graph formed by the whole calculation expressions, and the calculation process of the neural network model is realized in a composition manner.
Owner:ZHEJIANG LAB

Medium reinforced pipelined multiplication unit design method supporting multiple mode

The invention discloses a multi-mode supported media reinforcement flowline multiplication unit design method. Besides realizing the common multiplication and the multiplication progression operations, the invention also provides the single instruction multiple data (SIMD) mode aiming at the data parallelism in the media application, the multiple-instruction multiple data (SDMD) mode aiming at the non data parallelism in the media application and the high accuracy operation mode aiming at the high accuracy data operation. The four multiplication operation modes provide the support for the complex algorithm of the digital signal processing in the media application field. The four modes can be dynamically switched by instructions, thereby effectively reducing the time of the algorithm switch. The multiplication unit applies flowline design methods respectively aiming at different multiplications, thereby improving the operation speed of all multiplications and greatly enhancing the operation efficiency. When the multiplication unit provided by the invention is applied to an embedding type processor, the digital signal processing capability of the embedding type processor is improved and the application domain is widened.
Owner:ZHEJIANG UNIV +1

Distributed training method based on hybrid parallelism

The invention discloses a distributed model training method based on hybrid parallelism, relates to the technical field of deep neural network model training, and solves the problems by adopting a hybrid parallelism mode of data parallelism and model parallelism and using multiple nodes and multiple GPUs. Firstly, for the problem of long training time, a distributed clustering method is adopted toperform parallel computing on mass data, and the training speed is increased; secondly, for the problem that a classification layer model occupies too much video memory during training, a model parallel mode is adopted, the classification layer model is segmented into a plurality of parts and deployed on a plurality of GPUs of a plurality of nodes in a cluster, meanwhile, the number of the nodescan be dynamically adjusted according to the size of the classification layer model, and classification model training under the large ID condition is met. According to the method, a hybrid parallel mode based on data parallelism and model parallelism is used, distributed cluster training is used, the model training efficiency can be greatly improved while the original deep learning training effect is kept, and classification model training under a large ID is met.
Owner:西安烽火软件科技有限公司

XGBoost soft measurement modeling method based on parallel LSTM auto-encoder dynamic feature extraction

The invention discloses an XGBoost soft measurement modeling method based on parallel LSTM auto-encoder dynamic feature extraction, and belongs to the field of industrial process prediction and control. The LSTM auto-encoder extracts an encoding vector as a dynamic feature in a manner of restoring an input sequence; the idea of data parallelism and model parallelism is utilized to carry out distributed training on the network; the modeling efficiency is improved, the extracted dynamic features and the original features are combined to train an XGBoost model, and finally the steps of predictingthe test sample, repeatedly extracting and splicing the features, inputting the features into the XGBoost model and the like are carried out until prediction of all the samples is completed. The XGBoost soft measurement modeling method based on parallel LSTM auto-encoder dynamic feature extraction is helpful for solving dynamic problems in process soft measurement based on other models, and meanwhile, two modes of data parallelism and model parallelism are adopted in the network training process, so that the network training speed is increased; and it can be ensured that the precision of theXGBoost model is unchanged or stably improved after the dynamic characteristics are introduced, and the robustness of the model is improved.
Owner:ZHEJIANG UNIV

Matrix exponent-based parallel calculation method for electromagnetic transient simulation graphic processor

The invention provides a matrix exponent-based parallel calculation method for an electromagnetic transient simulation graphic processor. A to-be-researched overall electromagnetic transient simulation model of a power system is built under a state analysis framework; and electromagnetic transient rapid simulation of the power system is achieved by combining the data parallelism of a matrix exponent algorithm and the performance advantages of parallel calculation of the graphic processor. The matrix exponent-based parallel calculation method reserves good numerical precision and rigid processing capacity of a matrix exponential integration method, has general modeling and simulation capabilities of nonlinear links of power system elements, and achieves high efficiency of a matrix exponential integration algorithm in the field of large-scale electromagnetic transient simulation for the power system by high parallelism characteristic of data of the matrix exponent integration algorithm. Parallel calculation of the simulation graphic processor of the electromagnetic transient model of the general power system is achieved on the basis of the matrix exponential operation under a state analysis framework; and the calculation speed of the matrix exponent-based electromagnetic transient simulation method is improved.
Owner:天津天成恒创能源科技有限公司

Hardware acceleration implementation architecture for backward training of convolutional neural network based on FPGA

The invention discloses a hardware acceleration implementation architecture for backward training of a convolutional neural network based on an FPGA. Based on a basic processing module of backward training of each layer of a convolutional neural network, the hardware acceleration implementation architecture comprehensively considers operation processing time and resource consumption, and realizesa backward training process of the Hcnn convolutional neural network in a parallel pipeline form by using methods of parallel-serial conversion, data fragmentation, pipeline design resource reuse andthe like according to the principle of achieving the maximum degree of parallelism and the minimum amount of resource consumption as much as possible. According to the hardware acceleration implementation architecture, the characteristics of data parallelism and assembly line parallelism of the FPGA are fully utilized, and the implementation is simple, and the structure is more regular, and the wiring is more consistent, and the frequency is also greatly improved, and the acceleration effect is remarkable. More importantly, the hardware acceleration implementation architecture uses an optimized pulsation array structure to balance IO read-write and calculation, improves the throughput rate under the condition that less storage bandwidth is consumed, and effectively solves the problem thatthe data access speed is much higher than that of implementation of a convolutional neural network FPGA with the data processing speed.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Analysis method for degree of parallelism of simulation task on basis of DAG (Directed Acyclic Graph)

The invention discloses an analysis method for a degree of parallelism of a simulation task on basis of a DAG (Directed Acyclic Graph). A prototype system for realizing the analysis method mainly comprises a DAG-based simulation task description module, a DAG-normalizing module and a simulation task parallelism degree analyzing module. The analysis method comprises the concrete implementation steps: constructing a simulation task description and parallelism degree analyzing system; performing attribute description on the calculating complexity, communication coupling degree and task causality sequence of the simulation task by the DAG-based simulation task description module; performing DAG-normalizing processing on the simulation task by the DAG-normalizing module; automatically performing inter-task parallelism degree analyzing and obtaining a quantified parallelism degree value by the simulation task parallelism degree analyzing module according to the normalized DAG. The analysis method for the degree of parallelism of the simulation task oriented to high-effect simulation of a complex system is realized, the inter-task parallelism can be quickly, effectively and automatically analyzed according to the DAG description of the simulation task and the parallelism property and efficiency of a high-effect simulation system are ensured.
Owner:BEIJING SIMULATION CENT

Three-dimensional reconstruction algorithm parallelization method based on GPU cluster

InactiveCN111968218AOptimize the rebuild processOptimize and speed up the rebuilding processMultiple digital computer combinationsElectric digital data processingComputational sciencePoint cloud
The invention discloses a three-dimensional reconstruction algorithm parallelization method based on a GPU cluster, and relates to the technical field of computer vision. The method is based on an SfMalgorithm, researches a drone image three-dimensional reconstruction technology process, and adopts a GPU cluster as a processing platform to solve the problem that drone three-dimensional dense reconstruction processing is time-consuming. Specifically, based on an SFM_MVS correlation theory, a real scene three-dimensional reconstruction correlation process based on a picture sequence is grasped,and meanwhile, an MPI parallel programming technology and a GPU parallel programming technology are adopted to carry out optimization acceleration research work on part of links of the three-dimensional reconstruction process. According to the method, sparse reconstruction algorithm operator replacement is carried out by using a cluster, so that the problems of large data volume and time-consuming calculation of aerial images of an unmanned aerial vehicle are effectively solved; and dense point cloud reconstruction flow in the later stage of three-dimensional reconstruction is effectively accelerated by coarse-grained data parallelism of a dense reconstruction algorithm and fine-grained parallelism and optimization of a dense matching algorithm feature extraction link.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products