Thomas W. Christopher Consulting in High Performance Computing. Training in Concurrency, Networking, and Parallelism, especially in Python and Java.

# Bitonic Sort

Abstract

Author
Thomas W. Christopher

Bitonic sort is a sorting algorithm designed specially for parallel machines.

A sorted sequence is a monotonically non-decreasing (or non-increasing) sequence. A bitonic sequence is composed of two subsequences, one monotonically non-decreasing and the other monotonically non-increasing. A "V" and an A-frame are examples of bitonic sequences.

I get tired of saying "non-decreasing" and "non-increasing." They are clunky and throw an extra negative into sentences. I will use "ascending" to mean "non-decreasing" and "descending" to mean "non-increasing."

Moreover, any rotation of a bitonic sequence is a bitonic sequence, or if you prefer, one of the subsequences can wrap around the end of the bitonic sequence.

Of course, a sorted sequence is itself a bitonic sequence: one of the subsequences is empty.

Now we come to a strange property of bitoinic sequences, the property that is uses in bitonic sort:

Suppose you have a bitonic sequence of length 2n, that is, elements in positions [0,2n). You can easily divide it into two halves, [0,n) and [n,2n), such that

• each half is a bitonic sequence, and
• every element in half [0,n) is less than or equal to each element in [n,2n). (Or greater than or equal to, of course.)

What is this easy method? Simply compare elements in the corresponding positions in the two halves and exchange them if they are out of order.

for (i=0;i<n;i++) {
if (get(i)>get(i+n)) exchange(i,i+n);
}

This is sometimes called a bitonic merge.

Why does it work? I'm not sure I could do a concise and clear enough proof to fit in with these pages. Let's just leave it without a proof, but true.

So here's how we do a bitonic sort:

• We sort only sequences a power of two in length, so we can always divide a subsequence of mnore than one element into two halves.
• We sort the lower half into ascending (=non-decreasing, remember) order and the upper half into descending order. This gives us a bitonic sequence.
• We perform a bitonic merge on the sequence, which gives us a bitonic sequence in each half and all the larger elements in the upper half.
• We recursively bitonically merge each half until all the elements are sorted.

Bitonic Sort Bitonic Sort This first version actually uses recursion. It uses methods sortup, sortdown, mergeup, and mergedown, to sort into ascending order or descending order and to recursively merge into ascending or descending order.

Method void sortup(int m, int n) will sort the n elements in the range [m,m+n) into ascending order. It uses method void mergeup(int m, int n) to merge the n elements in the subsequence [m,m+n) into ascending order. Similarly for void sortdown(int m, int n) and void mergedown(int m, int n).

The overall sort is performed by a call:

sortup(0,N);

Both sortup and sortdown recursively sort each half to produce an A-frame shape and then recursively merge that into an ascending or descending sequence.

void sortup(int m, int n) {//from m to m+n
if (n==1) return;
sortup(m,n/2);
sortdown(m+n/2,n/2);
mergeup(m,n/2);
}
void sortdown(int m, int n) {//from m to m+n
if (n==1) return;
sortup(m,n/2);
sortdown(m+n/2,n/2);
mergedown(m,n/2);
}

Methods mergeup and mergedown are fairly straightfoward. They compare elements in the two halves, exchange them if they are out of order, and recursively merge the two halves. Call mergeup( m,  n) sorts into ascending order the 2*n elements in the range [m,2n).

void mergeup(int m, int n) {
if (n==0) return;
int i;
for (i=0;i<n;i++) {
if (get(m+i)>get(m+i+n)) exchange(m+i,m+i+n);
}
mergeup(m,n/2);
mergeup(m+n,n/2);
}
void mergedown(int m, int n) {
if (n==0) return;
int i;
for (i=0;i<n;i++) {
if (get(m+i)<get(m+i+n)) exchange(m+i,m+i+n);
}
mergedown(m,n/2);
mergedown(m+n,n/2);
}

Bitonic Sort2 Bitonic Sort2 Bitonic sort is perfect for parallelization. The recursive calls of merge can be done in parallel. The loops in the merges, comparing and conditionally exchanging elements (m+i) and (m+i+n) can also be run in parallel.

However, getting it to run efficiently in parallel requires turning the algorithm upside down. The algorithm is expressed in terms of recursive calls. What we need is an algorithm in terms of each individual element.

Let's examine what the elements are doing. We will use an eight-element array and identify the elements by their positions in binary.

 index 0 1 2 3 4 5 6 7 in binary 0000 0001 0010 0011 0100 0101 0110 0111

The bits in the addresses are numbered from 0 on the right: the rightmost bit is bit 0, the bit to its left is bit 1, etc. Bit j contributes 2j to the value of the binary number.

Here is the sequence of operations that can be performed in parallel:

1. All elements are sorted single element subsequences.
2. Pairs of elements are sorted into ascending or descending subsequences:
 [0000,0001] ascending [0010,0011] descending [0100,0101] ascending [0110,0111] descending
• Pairs of elements whose numbers differ in bit 0, the lowest bit, are compared and conditionally exchanged.
• Pairs of numbers whose bit 1 is zero are sorted in ascending order. Those whose bit 1 is one are sorted into descending order.
1. These pairs are compared and conditionally exchanged:

First, corresponding to the top level merge call:

 0000<->0010 ascending 0001<->0011 ascending 0100<->0110 descending 0101<->0111 descending

Second, corresponding to the recursive merge calls:

 0000<->0001 ascending 0010<->0011 ascending 0100<->0101 descending 0110<->0111 descending

In both cases, bit 2 determines whether the elements are sorted ascending (bit 2 equals 0) or descending (=1).

In the first level recursive call of mergeup or mergedown, the elements compared have positions that differ in bit 1. In the second level calls, the elements compared differ in bit 0.

1. The final series of merges that puts the array in order corresponds to three levels of calls to mergeup.

First level call of mergeup:

 0000<->0100 ascending 0001<->0101 ascending 0010<->0110 ascending 0011<->0111 ascending

Two second level recursive calls of mergeup:

 0000<->0010 ascending 0001<->0011 ascending 0100<->0110 ascending 0101<->0111 ascending

Four third level recursive calls of mergeup:

 0000<->0001 ascending 0010<->0011 ascending 0100<->0101 ascending 0110<->0111 ascending

In all cases, bit 3 determines that the elements are sorted ascending (bit 3 equals 0).

In the first level recursive call of mergeup or mergedown, the elements compared have positions that differ in bit 2. In the second level calls, the elements compared differ in bit 1. In the third level calls, the elements compared differ in bit 0.

So here's the code:

`    int i,j,k;    for (k=2;k<=N;k=2*k) {      for (j=k>>1;j>0;j=j>>1) {        for (i=0;i<N;i++) {          int ixj=i^j;          if ((ixj)>i) {            if ((i&k)==0 && get(i)>get(ixj)) exchange(i,ixj);            if ((i&k)!=0 && get(i)<get(ixj)) exchange(i,ixj);          }        }      }    }`

In this code, k selects the bit position that determines whether the pairs of elements are to be exchanged into ascending or descending order. Variable j corresponds to the distance apart the elements are that are to be compared and conditionally exchanged.   Variable i goes through all the elements; ixj is the element that is the pair of element i (the exclusive-or of i and j, the element whose position differs only in bit position (log2 j)). We only compare elements i and ixj if i<ixj. This avoids comparing them twice.

Bitonic Sort3 Bitonic Sort3. The third version of bitonic sort is the same as the second except that the array is shown after each iteration of the for i loop. This gives the appearance of a parallel algorithm.

In this version, the delay between each showing of the array is set to five times the delay following exchanges of single elements in the other versions. There are two reasons: first, it simulates the longer time required to exchange elements between nodes of a distributed memory parallel computer, and second, it slows down the simulation enough that you will be able to make sense of it.