Thomas W. Christopher | Consulting in High Performance Computing. Training in Concurrency, Networking, and Parallelism, especially in Python and Java. |
Abstract
Continuing a tutorial on sorting algorithms, this page animates bitonic sort.
Author
Thomas W. Christopher
Bitonic sort is a sorting algorithm designed specially for parallel machines.
A sorted sequence is a monotonically non-decreasing (or non-increasing) sequence. A bitonic sequence is composed of two subsequences, one monotonically non-decreasing and the other monotonically non-increasing. A "V" and an A-frame are examples of bitonic sequences.
I get tired of saying "non-decreasing" and "non-increasing." They are clunky and throw an extra negative into sentences. I will use "ascending" to mean "non-decreasing" and "descending" to mean "non-increasing."
Moreover, any rotation of a bitonic sequence is a bitonic sequence, or if you prefer, one of the subsequences can wrap around the end of the bitonic sequence.
Of course, a sorted sequence is itself a bitonic sequence: one of the subsequences is empty.
Now we come to a strange property of bitoinic sequences, the property that is uses in bitonic sort:
Suppose you have a bitonic sequence of length 2n, that is, elements in positions [0,2n). You can easily divide it into two halves, [0,n) and [n,2n), such that
- each half is a bitonic sequence, and
- every element in half [0,n) is less than or equal to each element in [n,2n). (Or greater than or equal to, of course.)
What is this easy method? Simply compare elements in the corresponding positions in the two halves and exchange them if they are out of order.
for (i=0;i<n;i++) {
if (get(i)>get(i+n)) exchange(i,i+n);
}
This is sometimes called a bitonic merge.
Why does it work? I'm not sure I could do a concise and clear enough proof to fit in with these pages. Let's just leave it without a proof, but true.
So here's how we do a bitonic sort:
Bitonic Sort This first version actually uses recursion. It uses methods sortup, sortdown, mergeup, and mergedown, to sort into ascending order or descending order and to recursively merge into ascending or descending order.
Method void sortup(int m, int n) will sort the n elements in the range [m,m+n) into ascending order. It uses method void mergeup(int m, int n) to merge the n elements in the subsequence [m,m+n) into ascending order. Similarly for void sortdown(int m, int n) and void mergedown(int m, int n).
The overall sort is performed by a call:
sortup(0,N);
Both sortup and sortdown recursively sort each half to produce an A-frame shape and then recursively merge that into an ascending or descending sequence.
void sortup(int m, int n) {//from m to m+n
if (n==1) return;
sortup(m,n/2);
sortdown(m+n/2,n/2);
mergeup(m,n/2);
}
void sortdown(int m, int n) {//from m to m+n
if (n==1) return;
sortup(m,n/2);
sortdown(m+n/2,n/2);
mergedown(m,n/2);
}
Methods mergeup and mergedown are fairly straightfoward. They compare elements in the two halves, exchange them if they are out of order, and recursively merge the two halves. Call mergeup( m, n) sorts into ascending order the 2*n elements in the range [m,2n).
void mergeup(int m, int n) {
if (n==0) return;
int i;
for (i=0;i<n;i++) {
if (get(m+i)>get(m+i+n)) exchange(m+i,m+i+n);
}
mergeup(m,n/2);
mergeup(m+n,n/2);
}
void mergedown(int m, int n) {
if (n==0) return;
int i;
for (i=0;i<n;i++) {
if (get(m+i)<get(m+i+n)) exchange(m+i,m+i+n);
}
mergedown(m,n/2);
mergedown(m+n,n/2);
}
Bitonic Sort2 Bitonic sort is perfect for parallelization. The recursive calls of merge can be done in parallel. The loops in the merges, comparing and conditionally exchanging elements (m+i) and (m+i+n) can also be run in parallel.
However, getting it to run efficiently in parallel requires turning the algorithm upside down. The algorithm is expressed in terms of recursive calls. What we need is an algorithm in terms of each individual element.
Let's examine what the elements are doing. We will use an eight-element array and identify the elements by their positions in binary.
index 0 1 2 3 4 5 6 7 in binary 0000 0001 0010 0011 0100 0101 0110 0111
The bits in the addresses are numbered from 0 on the right: the rightmost bit is bit 0, the bit to its left is bit 1, etc. Bit j contributes 2^{j} to the value of the binary number.
Here is the sequence of operations that can be performed in parallel:
[0000,0001] | ascending |
[0010,0011] | descending |
[0100,0101] | ascending |
[0110,0111] | descending |
- Pairs of elements whose numbers differ in bit 0, the lowest bit, are compared and conditionally exchanged.
- Pairs of numbers whose bit 1 is zero are sorted in ascending order. Those whose bit 1 is one are sorted into descending order.
First, corresponding to the top level merge call:
0000<->0010 ascending 0001<->0011 ascending 0100<->0110 descending 0101<->0111 descending
Second, corresponding to the recursive merge calls:
0000<->0001 ascending 0010<->0011 ascending 0100<->0101 descending 0110<->0111 descending
In both cases, bit 2 determines whether the elements are sorted ascending (bit 2 equals 0) or descending (=1).
In the first level recursive call of mergeup or mergedown, the elements compared have positions that differ in bit 1. In the second level calls, the elements compared differ in bit 0.
First level call of mergeup:
0000<->0100 | ascending |
0001<->0101 | ascending |
0010<->0110 | ascending |
0011<->0111 | ascending |
Two second level recursive calls of mergeup:
0000<->0010 ascending 0001<->0011 ascending 0100<->0110 ascending 0101<->0111 ascending
Four third level recursive calls of mergeup:
0000<->0001 ascending 0010<->0011 ascending 0100<->0101 ascending 0110<->0111 ascending
In all cases, bit 3 determines that the elements are sorted ascending (bit 3 equals 0).
In the first level recursive call of mergeup or mergedown, the elements compared have positions that differ in bit 2. In the second level calls, the elements compared differ in bit 1. In the third level calls, the elements compared differ in bit 0.
So here's the code:
int i,j,k;
for (k=2;k<=N;k=2*k) {
for (j=k>>1;j>0;j=j>>1) {
for (i=0;i<N;i++) {
int ixj=i^j;
if ((ixj)>i) {
if ((i&k)==0 && get(i)>get(ixj)) exchange(i,ixj);
if ((i&k)!=0 && get(i)<get(ixj)) exchange(i,ixj);
}
}
}
}In this code, k selects the bit position that determines whether the pairs of elements are to be exchanged into ascending or descending order. Variable j corresponds to the distance apart the elements are that are to be compared and conditionally exchanged. Variable i goes through all the elements; ixj is the element that is the pair of element i (the exclusive-or of i and j, the element whose position differs only in bit position (log_{2} j)). We only compare elements i and ixj if i<ixj. This avoids comparing them twice.
Bitonic Sort3. The third version of bitonic sort is the same as the second except that the array is shown after each iteration of the for i loop. This gives the appearance of a parallel algorithm.
In this version, the delay between each showing of the array is set to five times the delay following exchanges of single elements in the other versions. There are two reasons: first, it simulates the longer time required to exchange elements between nodes of a distributed memory parallel computer, and second, it slows down the simulation enough that you will be able to make sense of it.
Parallel bitonic sort. Here is a version of bitonic sort that uses the Tools of Computing thread package. A version of the first bitonic sort algorithm builds a task graph to do the sorting. The tasks are placed in a RunQueue when they become runnable. Tasks become runnable when all tasks in a predecessor set have terminated. There are a limited number of Threads (4) running tasks from the queue. When tasks complete, they signal a TerminationGroup. When all tasks in a TerminationGroup have signaled their completion, the tasks waiting for the termination of that group are made runnable.
Back to beginning of the tutorial
These AdSense ads may be a little inappropriate for an academic page, but they may help pay the costs of maintaining the website.