Supercharge Your Innovation With Domain-Expert AI Agents!

Method for improving parallel NumPy computing performance by using characteristics of non-uniform memory access architecture

A computing performance, parallel computing technology, applied in computing, program control design, multi-program device, etc., can solve problems such as bandwidth waste, achieve the effect of improving performance, effective utilization, and reducing performance problems

Active Publication Date: 2021-05-28
SUN YAT SEN UNIV
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although this allocation method allows data exchange between CPU cores to achieve minimal latency, it is clear that RAM1, RAM2, and RAM3 and their bandwidth are wasted

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for improving parallel NumPy computing performance by using characteristics of non-uniform memory access architecture
  • Method for improving parallel NumPy computing performance by using characteristics of non-uniform memory access architecture
  • Method for improving parallel NumPy computing performance by using characteristics of non-uniform memory access architecture

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0074] Figure 5 It is a schematic diagram of the CPU allocation method optimized for NumPy parallel computing. The yellow part in the figure is the core selected for this allocation method, which selects four cores distributed on NUMA0, NUMA1, NUMA2 and NUMA3.

[0075] The principle of using this allocation method is to use as many NUMA nodes as possible. In this way, compared with the traditional CPU allocation method, the memory bandwidth is quadrupled. At the same time, because the CPU in the two sockets is used, the L3 Cache has also become twice the traditional CPU allocation method.

[0076] Although this allocation method increases the overhead of data exchange between cores, it can make full use of the computer's memory and memory bandwidth. And application performance can also benefit from the increase in L3 Cache capacity, which is huge for programs with better program locality.

[0077] The traditional CPU allocation method can make the data exchange delay betwe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for improving parallel NumPy computing performance by using characteristics of a non-uniform memory access architecture. The method comprises the following steps: analyzing the characteristics of a non-uniform memory access (NUMA) architecture and NumPy parallel computing; providing a central processing unit (CPU) allocation program aiming at NumPy parallel computing and NUMA architecture optimization and an optimized version NumPy parallel computing system based on CPU allocation program and process binding. The method has the beneficial effects that the CPU distributor can generate a CPU configuration file according to the characteristics of the NUMA architecture computer and the NumPy parallel computing, and the process is bound to a proper CPU core for operation by utilizing process binding. Therefore, hardware resources of the computer can be utilized more effectively, performance problems caused by process migration are reduced, and parallel computing performance is improved.

Description

technical field [0001] The invention belongs to the technical field of computer algorithm performance improvement, and in particular relates to a method for improving parallelized NumPy calculation performance by utilizing the characteristics of a non-uniform memory access architecture. [0002] technical background [0003] NumPy is a matrix and multidimensional array computing library for the Python language. It uses C to implement the core calculation part of each algorithm, so that its operating efficiency can reach the level of compiled language. It can also further improve the performance of linear algebra operations by connecting BLAS and LAPACK. The commonly used fields of NumPy include scientific computing, machine learning, data analysis, data visualization, etc., and the demand for performance in these fields is increasing day by day. NumPy is a computing library for serial computing, and parallel computing is an effective way to improve the performance of NumPy. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/34G06F9/445G06F9/50
CPCG06F11/3409G06F9/4451G06F9/5016
Inventor 梁嘉迪杜云飞卢宇彤肖侬
Owner SUN YAT SEN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More