Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-vector staggered execution method for eliminating cache missing in SIMD vectorization program

An execution method and cache miss technology, applied in concurrent instruction execution, machine execution device, program control design, etc., can solve the problems that the program cannot fully utilize the system, control flow divergence, etc., and achieve the effect of reducing cache miss and improving performance

Pending Publication Date: 2020-05-15
EAST CHINA NORMAL UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The control flow divergence problem can prevent programs from taking full advantage of the system's memory-level parallelism, because free slots do not issue memory prefetches
[0004] To sum up, the vectorized code is subject to the memory wall, and if the software prefetching algorithm that avoids the memory wall in the scalar code is directly applied to the vectorized code, it will encounter the control flow divergence problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-vector staggered execution method for eliminating cache missing in SIMD vectorization program
  • Multi-vector staggered execution method for eliminating cache missing in SIMD vectorization program
  • Multi-vector staggered execution method for eliminating cache missing in SIMD vectorization program

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0026] Applying the invention to query pipelines that face control flow divergence and large memory accesses, examples of pipelines are figure 1 shown. In this pipeline, tuples come from the scan operator, but only some of them can pass the condition in the filter operator, then calculate the hash value of the tuple connection key value and match the hash table in the detection operator, and finally Count the number of eligible tuples.

[0027] In the query pipeline in this embodiment, the process of calculating the hash value of the tuple connection key value and matching the hash table in the detection operator is as follows figure 2 shown. The detection is performed on a chained hash table. Due to hash collisions, each hash bucket may contain multiple nodes. Each node consists of several tuples (for simplicity, only one tuple is placed here) and its A pointer to the next node. In the hash connection detection, a tuple is sequentially extracted from the relational table...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-vector staggered execution method for eliminating cache missing in an SIMD vectorization program. The method comprises the following steps: alternately executing instances of a plurality of vectorization codes; when data access occurs in one execution instance, only sending out a data prefetching instruction and then switching to other execution instances by the execution instance, so that expected data access is overlapped with operations of a plurality of execution instances; for a control flow divergence problem in vectoring code, using a residual vector state to integrate with the diverged vector state to eliminate bubbles in the vector. Data level parallelism of SIMD vectors and memory level parallelism provided by the system are fully utilized, cache missing, branch missing and calculation expenditure are reduced, and therefore the performance of pointer chain type application is remarkably improved, and the pointer chain type application can be applied to execution of a whole query assembly line.

Description

technical field [0001] The invention belongs to the technical field of software development, in particular to a multi-vector interleaving execution method for eliminating cache misses in SIMD vectorization programs. Background technique [0002] To improve processing performance, modern processors offer data-level parallelism, known as SIMD (Single Instruction Multiple Data) instructions. The SIMD instruction set is widely used to accelerate operations in databases, graphics, and other domains, including joins, partitions, sorts, Bloom filters, selections, set intersection, and compression. These operations benefit from vectorized execution in SIMD to reduce computational overhead and branch misjudgments. However, when in-memory data is accessed frequently and randomly, such as probing hash tables, probing Bloom filters, and search trees, the benefits of SIMD are diminished or lost entirely. Because these operations are slowed down by memory access latencies when working w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/38
CPCG06F9/3887
Inventor 翁楚良方祝和郑蓓蕾
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products