Sunday, July 1, 2018

C# : Comparison Sorting Algorithms

In Computer Science and Mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. 

These algorithms are briefly divided into two parent categories as Comparison based sorting algorithms, which include BubbleSort, SelectionSort, InsertionSort, ShellSort, MergeSort, HeapSort, and QuickSort, as most commonly used amongst all, and Non-Comparison based sorting algorithms, such as RadixSort, BucketSort etc. The focus of this article is to provide various Comparison based sorting algorithms in a simplistic way with optimized, cleaned up, and easy to understand code (with proper comments and instructions).

----


What is time complexity of .NET List.sort()

This method uses Array.Sort, which uses the QuickSort algorithm. This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.
On average, this method is an O(n log n) operation, where n is Count; in the worst case it is an O(n ^ 2) operation.

-----------

Note :  Insertion Sort performs better than Bubble Sort.

---

Background

In the process of getting deeper and deeper into various Comparison based sorting algorithms, I realised that most of the algorithms are illustrated using pointers in C / C++, and they have rarely been coded in C# for obvious reasons. However, being primarily a C# developer, I know the pain one has to undergo through if he wants to relate the algorithm to the specifications in his own programming language and program in a simple and efficient manner.
This article is a compiled list of my coding adventures with these sorting algorithms. I will not illustrate graphically or pictorially as to how these algorithms work because Wikipedia is the best source for that, and in fact, I have inherited this knowledge from there itself. However, my focus will be from a pure programmer's perspective on the implementation of these algorithms in C# and what needs to be taken care of while keeping in mind the best case, worst case, and average case complexities.

Using the code

Under the parent category 'Comparison based sorting algorithms', we have several sub-categories such as Exchange sorts, Selection sorts, Insertion sorts, Merge sorts etc., which we are going to see in a short while, with implementation.
The first step is to create sample datasets (array, in our case). Also create two swap algorithms, one with a temp variable and one without a temp variable, to be used in our sorting mechanisms.
long[] inputArray = new long[1000];

Random rnd = new Random();

for (int i = 0; i < inputArray.Length; i++)
{
    inputArray[i] = rnd.Next();
}

private void Swap(ref long valOne, ref long valTwo)
{
    valOne = valOne + valTwo;
    valTwo = valOne - valTwo;
    valOne = valOne - valTwo;
}

private void SwapWithTemp(ref long valOne, ref long valTwo)
{
    long temp = valOne;
    valOne = valTwo;
    valTwo = temp;
}
This is the sample data (random) and then the swap mechanisms we are going to use with our sorting algorithms.

Exchange sorts

Now let's begin with the category 'Exchange based sorts'. We have two most commonly used algorithms here. One is the simplistic and trivial 'Bubble Sort', and another is the fastest in the group, 'Quick Sort'.

Bubble Sort

It is a straightforward and simplistic method of sorting data. The algorithm starts at the beginning of the data set. It compares the first two elements, and if the first is greater than the second, then it swaps them. It continues doing this for each pair of adjacent elements to the end of the data set. It then starts again with the first two elements, repeating until no swaps have occurred on the last pass. This algorithm is highly inefficient, and is rarely used.
  • Best case - O(n)
  • Average case - O(n^2)
  • Worst case - O(n^2)
  • Stability - stable
private void BubbleSort(long[] inputArray)
{
    for (int iterator = 0; iterator < inputArray.Length; iterator++)
    {
        for (int index = 0; index < inputArray.Length - 1; index++)
        {
            if (inputArray[index] > inputArray[index + 1])
            {
                Swap(ref inputArray[index], ref inputArray[index+1]);
            }
        }
    }
}

Quick Sort

It is a divide and conquer algorithm which relies on a partition operation, i.e., to partition an array, choose an element called a pivot, move all smaller elements before the pivot, and move all greater elements after it. This can be done efficiently in linear time and in-place. Later, recursively sort the lesser and greater sublists.
  • Best case - O(n log n)
  • Average case - O(n log n)
  • Worst case - O(n^2)
  • Stability - depends on how the pivot is handled
private void QuickSort(long[] inputArray)
{
    int left = 0;
    int right = inputArray.Length - 1;
    InternalQuickSort(inputArray, left, right);
}

<summary>
// Internal recursive sort algorithm for quick sort
// using divide and conquer. Sorting is done based on pivot
</summary>
<param name="inputArray"></param>
<param name="left"></param>
<param name="right"></param>
private void InternalQuickSort(long[] inputArray, int left, int right)
{
    int pivotNewIndex = Partition(inputArray, left, right);
    long pivot = inputArray[(left + right) / 2];
    if (left < pivotNewIndex-1)
        InternalQuickSort(inputArray, left, pivotNewIndex - 1);
    if (pivotNewIndex < right)
        InternalQuickSort(inputArray, pivotNewIndex, right);
}

//This operation returns a new pivot everytime it is called recursively
//and swaps the array elements based on pivot value comparison
private int Partition(long[] inputArray, int left, int right)
{
    int i = left, j = right;
    long pivot = inputArray[(left + right) / 2];

    while (i <= j)
    {
        while (inputArray[i] < pivot)
            i++;
        while (inputArray[j] < pivot)
            j--;
        if (i <= j)
        {
            SwapWithTemp(ref inputArray[i], ref inputArray[j]);
            i++; j--;
        }
    }
    return i;
}

Selection Sorts

In the Selection sorts category, again we have two familiar algorithms. One is 'Selection Sort' itself and another is 'Heap Sort'.

Selection Sort

It is essentially an in-place comparison sort. It has O(n2) complexity, making it inefficient on large lists, and generally performs worse than the similar Insertion Sort. Selection sort is noted for its simplicity, and also has performance advantages over more complicated algorithms in certain situations. It performs better than Bubble Sort in almost all cases.
  • Best case - O(n^2)
  • Average case - O(n^2)
  • Worst case - O(n^2)
  • Stability - depends on the implementation
private void SelectionSort(long[] inputArray)
{
    long index_of_min = 0;
    for (int iterator = 0; iterator < inputArray.Length - 1; iterator++)
    {
        index_of_min = iterator;
        for (int index = iterator+1; index < inputArray.Length; index++)
        {
            if (inputArray[index] < inputArray[index_of_min])
                index_of_min = index;
        }
        Swap(ref inputArray[iterator], ref inputArray[index_of_min]);
    }
}

Heap Sort

It is a Comparison-based sorting algorithm, and is part of the Selection Sort family. Although somewhat slower in practice on most machines than a good implementation of Quick Sort, it has the advantage of a more favorable worst-case (n log n) runtime. Heap Sort is an in-place algorithm, but is not a stable sort. Heap Sort works as its name suggests. It begins by building a heap out of the data set, and then removing the largest item and placing it at the end of the sorted array. After removing the largest item, it reconstructs the heap, removes the largest remaining item, and places it in the next open position from the end of the sorted array. This is repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements.
Heap Sort inserts the input list elements into a heap data structure. The largest value (in a max-heap) is extracted until none remain, the values having been extracted in sorted order. The heap's invariant is preserved after each extraction, so the only cost is that of extraction.
  • Best case - O(n log n)
  • Average case - O(n log n)
  • Worst case - O(n log n)
  • Stability - unstable
private void HeapSort(long[] inputArray)
{
    for (int index = (inputArray.Length / 2) - 1; index >= 0; index--)
        Heapify(inputArray, index, inputArray.Length);
    for (int index = inputArray.Length - 1; index >= 1; index--)
    {
        SwapWithTemp(ref inputArray[0], ref inputArray[index]);
        Heapify(inputArray, 0, index - 1);
    }
}
This function internally calls the Heapify() function as shown above, which builds a heap data structure out of array contents using one of the special properties of heap, which says if B is a child node of A, then key(A) ≥ key(B). This implies that an element with the greatest key is always in the root node (max-heap).
private void Heapify(long[] inputArray, int root, int bottom)
{
    bool completed = false;
    int maxChild;

    while((root*2 <= bottom) && (!completed))
    {
        if (root * 2 == bottom)
            maxChild = root * 2;
        else if (inputArray[root * 2] > inputArray[root * 2 + 1])
            maxChild = root * 2;
        else
            maxChild = root * 2 + 1;
        if (inputArray[root] < inputArray[maxChild])
        {
            SwapWithTemp(ref inputArray[root], ref inputArray[maxChild]);
            root = maxChild;
        }
        else
        {
            completed = true;
        }
    }
}

Insertion sorts

Our next category is Insertion sorts, where we cover two algorithms: 'Insertion Sort' itself and 'Shell Sort'. Let's have a look.

Insertion Sort

It is a simple sorting algorithm that is relatively efficient for small lists and mostly-sorted lists, and often is used as part of more sophisticated algorithms. It works by taking elements from the list one by one and inserting them in their correct position into a new sorted list. Shell Sort is a variant of Insertion Sort, which is more efficient for larger lists because in arrays, insertion is expensive and requires shifting of all elements over by one.
  • Best case - O(n)
  • Average case - O(n^2)
  • Worst case - O(n^2)
  • Stability - stable
private void InsertionSort(long[] inputArray)
{
    long j = 0 ;
    long temp = 0 ;
    for (int index = 1; index < inputArray.Length; index++)
    {
        j = index;
        temp = inputArray[index];
        while ((j > 0) && (inputArray[j - 1] > temp))
        {
            inputArray[j] = inputArray[j - 1];
            j = j - 1;
        }
        inputArray[j] = temp;
    }
}

Shell Sort

It improves upon Bubble Sort and Insertion Sort by moving out of order elements more than one position at a time. It compares elements separated by a gap of several positions. This lets an element take "bigger steps" toward its expected position. Multiple passes over the data are taken with smaller and smaller gap sizes. The last step of Shell Sort is a plain Insertion Sort, but by then, the array of data is guaranteed to be almost sorted.
  • Best case - O(n)
  • Average case - depends on gap sequence
  • Worst case - O(n^2) or O(nlog^2 n) depending on gap sequence
  • Stability - unstable
private void ShellSort(long[] inputArray)
{
    long j, temp = 0;
    int increment = (inputArray.Length) / 2;
    while (increment > 0)
    {
        for (int index = 0; index < inputArray.Length; index++)
        {
            j = index;
            temp = inputArray[index];
            while ((j >= increment) && inputArray[j - increment] > temp)
            {
                inputArray[j] = inputArray[j - increment];
                j = j - increment;
            }
            inputArray[j] = temp;
        }
        if (increment / 2 != 0)
            increment = increment / 2;
        else if (increment == 1)
            increment = 0;
        else
            increment = 1;
    }
}

Merge sorts

In this category, I have covered only one sort which is the common amongst all, and that is Merge Sort.

Merge Sort

It takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then merges each of the resulting lists of two into lists of four, then merges those lists of four, and so on; until at last two lists are merged into the final sorted list. In most implementations, it is stable, meaning that it preserves the input order of equal elements in the sorted output. It is an example of the divide and conquer algorithmic paradigm.
  • Best case - O(n) or O(n log n)
  • Average case - O(n log n)
  • Worst case - O(n log n)
  • Stability - depends on the implementation (if the in-place merging can be made stable, then this will be stable)
private void MergeSort(long[] inputArray)
{
    int left = 0;
    int right = inputArray.Length - 1;
    InternalMergeSort(inputArray, left, right);
}
InternalMergeSort is a recursive function which sorts the left and the right contents recursively, the code of which is straightforward and shown below.
private void InternalMergeSort(long[] inputArray, int left, int right)
{
    int mid = 0;
    if (left < right)
    {
        mid = (left + right) / 2;
        InternalMergeSort(inputArray, left, mid);
        InternalMergeSort(inputArray,(mid + 1), right);
        MergeSortedArray(inputArray, left, mid, right);
    }
}
The final step in the merge sort is to merge two sorted arrays (in the previous step). MergeSortedArray is the function which does exactly the same every time, recursively.
private void MergeSortedArray(long[] inputArray, int left, int mid, int right)
{
    int index = 0;
    int total_elements = right-left+1; //BODMAS rule
    int right_start = mid + 1;
    int temp_location = left;
    long[] tempArray = new long[total_elements];

    while ((left <= mid) && right_start <= right)
    {
        if (inputArray[left] <= inputArray[right_start])
        {
            tempArray[index++] = inputArray[left++];
        }
        else
        {
            tempArray[index++] = inputArray[right_start++];
        }
    }
    if (left > mid)
    {
        for(int j = right_start; j <= right; j++)
            tempArray[index++] = inputArray[right_start++];
    }
    else
    {
        for(int j = left; j <= mid; j++)
            tempArray[index++] = inputArray[left++];
    }
    //Array.Copy(tempArray, 0, inputArray, temp_location, total_elements);
    // just another way of accomplishing things (in-built copy)
    for (int i = 0, j = temp_location; i < total_elements; i++, j++)
    {
        inputArray[j] = tempArray[i];
    }
}

Stability of sorting algorithms

A sorting algorithm is stable if it maintains the relative ordering of records with equal keys. I.e., if there are equal keys (say that key is K) and there are two different records associated with that same key ( K-> A) and (K-> B), and if in the original unsorted list, A appears before B, then after applying the sorting algorithm, if their order is retained such as A appears before B only, the algorithm is stable. In case all keys are different, then this distinction is not necessary. Also, in case of any types which are indistinguishable such as integers or where the entire element is the key, stability does not come into picture.
For example, say we have this set as {6,8}, {2,4}, {6,5}, {4,7} and if we want to sort with the first component, then two different results can be achieved as:
  1. {2,4}, {4,7}, {6,8}, {6,5} and
  2. {2,4}, {4,7}, {6,5}, {6,8}
In this case a points to the stability of the sorting algorithm where the original order of equal keyed elements is maintained and b points to the unstable algorithm where the order 'may' be changed.
However, note that unstable sorting algorithms can be specially implemented to be stable. One way of doing this is to artificially extend the key comparison so that comparisons between two objects with otherwise equal keys are decided using the order of the entries in the original data order as a tie-breaker. Remembering this order, however, often involves an additional computational cost.
In the above mentioned sorting algorithms, Bubble, Insertion, and Merge Sorts are stable ones, whereas, Heap and Shell Sorts are unstable. In the case of Selection, Merge, and Quick Sorts, the stability of these algorithms is typically decided by the way they have been implemented.


No comments:

Post a Comment