Sorting is a general problem widely used in computer science. Sorting can achieve high performance, taking advantage of the parallel resources in modern GPUs. Rodinia includes a sorting algorithm using a hybrid method.
We would like to acknowledge Erik Sintorn and Ulf Assarsson at Chalmers University Of Technology Gothenburg, Sweden, who contributed their code to the Rodinia benchmark suite. The related paper can be found here
Initially, a parallel bucketsort splits the list into enough sublists then to be sorted in parallel using merge-sort. The parallel bucketsort, implemented in NVIDIA’s CUDA, utilizes the synchronization mechanisms, such as atomic increment, that is available on modern GPUs.