Sorting for data-parallel computing devices

A parallel computing and device technology, applied in computing, data classification, processing input data, etc., can solve the problems of large quantity, inefficient collection of data elements, inability to fully utilize the peak throughput of computing power of data parallel devices, etc.

Active Publication Date: 2019-07-02
GOOGLE LLC
View PDF1 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these semi-parallel and parallel algorithms suffer from many drawbacks, as they often cannot fully utilize the computational power of the data-parallel device and reach peak throughput until the number of values ​​to be sorted becomes so large that it fills the bandwidth of the data-parallel device
Furthermore, these semi-parallel algorithms often exhibit significant branch divergence, where adjacent processing elements do not execute the same instructions, preventing data-parallel devices from becoming fully computationally constrained
Processing smaller sets of data elements is also inefficient because the data parallel device may stall while waiting for data to be loaded or stored

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sorting for data-parallel computing devices
  • Sorting for data-parallel computing devices
  • Sorting for data-parallel computing devices

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] overview

[0037] This technique involves ordering and mapping data elements on a computer system. In particular, the ordering and mapping of data elements is performed on a data-parallel computing device such as a graphics processing unit (GPU) using a fully parallel processing pipeline.

[0038] Parallel processing pipelines can be implemented and controlled through custom Application Programming Interfaces (APIs) that provide data-parallel computing devices, such as graphics processing units, with access to kernel programs that perform processing. In this regard, each respective core may form part of a parallel processing pipeline, where each core utilizes standard APIs and sub-APIs to perform all processing of sorting, merging, mapping, etc. of data elements.

[0039] To efficiently execute a fully parallel processing pipeline, any program and / or algorithm should execute the same instructions on every element of the processor group, minimizing off-chip I / O operatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Aspects of the disclosure relate to determining relevant content in response to a request for information. One or more computing devices (170) may load data elements into registers (385A-385B), wherein each register is associated with at least one parallel processor in a group of parallel processors (380A-380B). For each of the parallel processors, the data elements loaded in its associated registers may be sorted, in parallel, in descending order. The sorted data elements, for each of the parallel processors, may be merged with the sorted data elements of other processors in the group. The merged and sorted data elements may be transposed and stored.

Description

[0001] Cross References to Related Applications [0002] This application claims the benefit of the filing date of U.S. Provisional Patent Application Serial No. 62 / 421,544, filed November 14, 2016, the disclosure of which is incorporated herein by reference. Background technique [0003] The ability to quickly and efficiently sort data is critical to many operations of computing devices. For example, many applications (such as search, data query processing, graphics, sparse linear algebra, machine learning, etc.) require substantially real-time data sorting. Many sorting methods rely on a single-threaded CPU to execute sequential sorting algorithms. This method of sorting is time consuming and requires significant processing resources. [0004] Recent improvements in sorting methods include semi-parallel and parallel algorithms executed by data-parallel devices such as graphics processing units (GPUs). However, these semi-parallel and parallel algorithms suffer from many d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F7/24G06F7/36
CPCG06F7/24G06F7/36G06F9/52
Inventor A.S.麦金农
Owner GOOGLE LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products