Order in the Data - A Guide to Fundamental Sorting and Searching Algorithms

February 2, 2023 — #DSA #Sorting & Searching

Hey everyone, and welcome back to the blog! The digital world, as we know it, thrives on organization. From your meticulously categorized music library to the lightning-fast search results on your favorite search engine, efficient data manipulation is the backbone of modern computing. Two of the most fundamental operations in this digital symphony are Sorting and Searching.

Sorting algorithms arrange items in a specific order, while searching algorithms help us find a specific item within a collection. Both are crucial for ensuring information retrieval and organization happen seamlessly. Today, we'll delve into the fascinating world of these algorithms, exploring their inner workings, complexity analysis, and practical implementations.

Defining Our Terms: Sorting vs. Searching

Sorting: This is the process of arranging items in a collection into a specific order, such as numerical order (ascending or descending) or lexicographical (alphabetical) order.
- Why sort? Organized data is much easier and faster to search, retrieve, and analyze. It's a fundamental operation with numerous applications in database management, data analysis, and information retrieval.
Searching: This is the process of finding a specific item (the "target") within a collection of items.
- Why search? It's essential for locating specific data within a dataset, whether it's finding a particular record in a database or a file on your computer.

Where Do We See Them? Applications

Sorting and searching algorithms are ubiquitous in computer science and have a wide range of applications:

Database Management: Sorting is used to organize data in databases, making queries (which involve searching) much faster. Indexes often rely on sorted structures.
Information Retrieval: Searching algorithms are the core of any system that needs to locate specific information, like finding records or documents.
Web Search Engines: Sophisticated searching algorithms (like Google's PageRank, though that's more about ranking) are used to find relevant web pages based on user queries.
Operating Systems: Used to locate files and directories (e.g., the Unix find command) and manage processes.
Artificial Intelligence: Various AI applications use sorting and searching, such as pathfinding algorithms in robotics and game playing, or for organizing data for machine learning models.

A Tour of Common Algorithms

Let's explore some well-known algorithms for sorting and searching.

Sorting Algorithms

1. Bubble Sort

Concept: A simple comparison-based algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. The passes through the list are repeated until no swaps are needed, which indicates that the list is sorted.

Pseudocode:

function BubbleSort(A):
    n = A.length
    for i = 0 to n - 2: // Outer loop for passes
        swapped = false
        for j = 0 to n - i - 2: // Inner loop for comparisons in current pass
            if A[j] > A[j + 1]:
                swap(A[j], A[j + 1])
                swapped = true
        if not swapped: // If no swaps in a pass, array is sorted
            break

2. Selection Sort

Concept: Another simple comparison-based algorithm. It divides the input list into two parts: a sorted sublist (initially empty) and an unsorted sublist. It repeatedly selects the minimum (or maximum, depending on order) element from the unsorted sublist and moves it to the end of the sorted sublist.

Pseudocode:

function SelectionSort(A):
    n = A.length
    for i = 0 to n - 2: // Iterate through the unsorted part
        minIndex = i
        for j = i + 1 to n - 1: // Find the minimum element in the remaining unsorted part
            if A[j] < A[minIndex]:
                minIndex = j
        // Swap the found minimum element with the first element of the unsorted part
        if minIndex != i:
             swap(A[i], A[minIndex])

3. Insertion Sort

Concept: A simple comparison-based algorithm that builds the final sorted array (or list) one item at a time. It iterates through the input elements and, for each element, "inserts" it into its correct position in the already sorted part of the list.

Pseudocode:

function InsertionSort(A):
    n = A.length
    for i = 1 to n - 1: // Start from the second element
        key = A[i]      // The element to be inserted
        j = i - 1       // Index of the last element in the sorted portion
       
        // Move elements of A[0..i-1], that are greater than key,
        // to one position ahead of their current position
        while j >= 0 and A[j] > key:
            A[j + 1] = A[j]
            j = j - 1
        A[j + 1] = key // Place key in its correct sorted position

4. Merge Sort

Concept: An efficient, comparison-based sorting algorithm that follows the divide-and-conquer paradigm.
1. Divide: The input list is divided into two (roughly) equal sublists.
2. Conquer: The sublists are recursively sorted using Merge Sort.
3. Combine: The two sorted sublists are merged back together to produce a single, final sorted list.

Pseudocode:

function MergeSort(A, left_index, right_index):
    if left_index < right_index: // Condition for recursion
        mid_index = floor((left_index + right_index) / 2)
       
        MergeSort(A, left_index, mid_index)      // Sort left sub-array
        MergeSort(A, mid_index + 1, right_index) // Sort right sub-array
       
        Merge(A, left_index, mid_index, right_index) // Merge the two sorted sub-arrays

function Merge(A, l, m, r): // l=left, m=mid, r=right
    n1 = m - l + 1 // Size of left temporary array
    n2 = r - m     // Size of right temporary array

    // Create temporary arrays
    L_temp = new Array(n1)
    R_temp = new Array(n2)

    // Copy data to temporary arrays L_temp[] and R_temp[]
    for i = 0 to n1 - 1:
        L_temp[i] = A[l + i]
    for j = 0 to n2 - 1:
        R_temp[j] = A[m + 1 + j]

    // Merge the temporary arrays back into A[l..r]
    i = 0 // Initial index of first sub-array
    j = 0 // Initial index of second sub-array
    k = l // Initial index of merged sub-array

    while i < n1 and j < n2:
        if L_temp[i] <= R_temp[j]:
            A[k] = L_temp[i]
            i = i + 1
        else:
            A[k] = R_temp[j]
            j = j + 1
        k = k + 1

    // Copy remaining elements of L_temp[], if any
    while i < n1:
        A[k] = L_temp[i]
        i = i + 1
        k = k + 1

    // Copy remaining elements of R_temp[], if any
    while j < n2:
        A[k] = R_temp[j]
        j = j + 1
        k = k + 1

5. Quick Sort

Concept: Another efficient, comparison-based sorting algorithm that also follows the divide-and-conquer paradigm.
1. Pick a Pivot: Select an element from the array as a "pivot."
2. Partition: Reorder the array so that all elements with values less than the pivot come before it, while all elements with values greater than the pivot come after it. After this partitioning, the pivot is in its final sorted position.
3. Conquer: Recursively apply Quick Sort to the sub-array of elements with smaller values and separately to the sub-array of elements with greater values.

Pseudocode:

function QuickSort(A, low_index, high_index):
    if low_index < high_index:
        // pi is partitioning index, A[pi] is now at right place
        pi = Partition(A, low_index, high_index)
       
        QuickSort(A, low_index, pi - 1)  // Recursively sort elements before partition
        QuickSort(A, pi + 1, high_index) // Recursively sort elements after partition

// This function takes last element as pivot, places
// the pivot element at its correct position in sorted
// array, and places all smaller (smaller than pivot)
// to left of pivot and all greater elements to right of pivot
function Partition(A, low, high):
    pivot_value = A[high] // Choosing the last element as pivot
    i = low - 1          // Index of smaller element

    for j = low to high - 1:
        // If current element is smaller than the pivot
        if A[j] < pivot_value:
            i = i + 1
            swap(A[i], A[j]) // Swap elements at i and j
   
    swap(A[i + 1], A[high]) // Swap pivot to its correct position
    return i + 1           // Return partitioning index

6. Heap Sort

Concept: A comparison-based sorting algorithm that uses a Binary Heap data structure.
1. Build Heap: Build a max-heap (or min-heap for descending sort) from the input data. In a max-heap, the largest element is at the root.
2. Sort: Repeatedly extract the maximum element from the heap (which is the root) and place it at the end of the sorted portion of the array. Then, heapify the remaining elements to restore the heap property.

Pseudocode:

function HeapSort(A):
    n = A.length

    // Build max heap (rearrange array)
    // Start from the last non-leaf node and heapify all nodes in reverse order of level
    for i = floor(n / 2) - 1 down to 0:
        Heapify(A, n, i)

    // One by one extract an element from heap
    for i = n - 1 down to 1: // Iterate from the end of the array
        swap(A[0], A[i]) // Move current root (max element) to end

        // Call max Heapify on the reduced heap (size i)
        Heapify(A, i, 0) // n is now i (size of current heap)

// To heapify a subtree rooted with node i which is an index in A[]
// n is size of heap
function Heapify(A, n, i):
    largest = i       // Initialize largest as root
    left_child = 2 * i + 1
    right_child = 2 * i + 2

    // If left child is larger than root
    if left_child < n and A[left_child] > A[largest]:
        largest = left_child

    // If right child is larger than largest so far
    if right_child < n and A[right_child] > A[largest]:
        largest = right_child

    // If largest is not root
    if largest != i:
        swap(A[i], A[largest])

        // Recursively heapify the affected sub-tree
        Heapify(A, n, largest)

7. Radix Sort

Concept: A non-comparison-based sorting algorithm that sorts integers by processing individual digits. It sorts digit by digit, from the least significant digit to the most significant digit (LSD Radix Sort) or vice-versa (MSD Radix Sort). It often uses a stable sorting algorithm like Counting Sort as a subroutine for sorting based on each digit.

Pseudocode (LSD Radix Sort using Counting Sort):

function RadixSort(A):
    // Find the maximum number to know number of digits
    max_val = getMax(A)

    // Do counting sort for every digit. Note that instead
    // of passing digit number, exp is passed. exp is 10^i
    // where i is current digit number.
    exp = 1
    while (max_val / exp) > 0:
        CountingSortByDigit(A, exp)
        exp = exp * 10

function getMax(A): // Helper to get maximum value
    max_v = A[0]
    for i = 1 to A.length - 1:
        if A[i] > max_v:
            max_v = A[i]
    return max_v

// A function to do counting sort of A[] according to
// the digit represented by exp.
function CountingSortByDigit(A, exp):
    n = A.length
    output = new Array(n) // Output array
    count = new Array(10) // Count array for digits 0-9
   
    // Initialize count array as 0
    for i = 0 to 9:
        count[i] = 0

    // Store count of occurrences in count[]
    for i = 0 to n - 1:
        digit = floor(A[i] / exp) % 10
        count[digit] = count[digit] + 1

    // Change count[i] so that count[i] now contains actual
    // position of this digit in output[]
    for i = 1 to 9:
        count[i] = count[i] + count[i - 1]

    // Build the output array
    // Start from the end to make it stable
    for i = n - 1 down to 0:
        digit = floor(A[i] / exp) % 10
        output[count[digit] - 1] = A[i]
        count[digit] = count[digit] - 1

    // Copy the output array to A[], so that A[] now
    // contains sorted numbers according to current digit
    for i = 0 to n - 1:
        A[i] = output[i]

Searching Algorithms

1. Linear Search

Concept: A simple search algorithm that sequentially checks each element of a list until the target element is found or the end of the list is reached. Also known as sequential search.

Pseudocode:

function LinearSearch(A, target_value):
    for i = 0 to A.length - 1:
        if A[i] == target_value:
            return i // Return index if found
    return -1       // Return -1 if not found

2. Binary Search

Concept: An efficient search algorithm that finds the position of a target value within a sorted array (or list). It repeatedly compares the target value with the middle element of the array. If they match, the search is successful. If the target is less than the middle element, the search continues in the lower half; otherwise, it continues in the upper half.

Pseudocode (Iterative):

function BinarySearch(A, target_value): // A must be sorted
    low = 0
    high = A.length - 1
   
    while low <= high:
        mid = floor((low + high) / 2)
       
        if A[mid] == target_value:
            return mid // Target found at index mid
        else if A[mid] < target_value:
            low = mid + 1 // Search in the right half
        else: // A[mid] > target_value
            high = mid - 1 // Search in the left half
           
    return -1 // Target not found

Time and Space Complexity Summary

Here's a table summarizing the complexities and stability of these algorithms:

Algorithm	Best Case Time	Average Case Time	Worst Case Time	Space Complexity	Stable?
Bubble Sort	O(n)	O(n²)	O(n²)	O(1)	Yes
Selection Sort	O(n²)	O(n²)	O(n²)	O(1)	No
Insertion Sort	O(n)	O(n²)	O(n²)	O(1)	Yes
Merge Sort	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes
Quick Sort	O(n log n)	O(n log n)	O(n²)	O(log n) (avg)	No
Heap Sort	O(n log n)	O(n log n)	O(n log n)	O(1)	No
Radix Sort	O(nk)	O(nk)	O(nk)	O(n + k)	Yes
Linear Search	O(1)	O(n)	O(n)	O(1)	N/A
Binary Search	O(1)	O(log n)	O(log n)	O(1) (iterative)	N/A

(Where $n$ is the number of elements, and for Radix Sort, $k$ is the number of digits in the maximum number. Stability refers to whether the algorithm maintains the relative order of equal elements.)

Conclusion

Sorting and searching are truly fundamental operations that every software developer encounters. While simple algorithms like Bubble Sort or Linear Search are easy to understand, their performance can degrade quickly with larger datasets. More advanced algorithms like Merge Sort, Quick Sort, Heap Sort, and Binary Search offer significantly better efficiency for larger inputs. Non-comparison sorts like Radix Sort can be even faster under specific conditions.

The choice of algorithm always depends on the specific requirements of your application, including the size of the dataset, the distribution of the data, memory constraints, and whether the data needs to be stable. By understanding the properties, and time and space complexities of these core algorithms, developers can make informed decisions to build efficient and performant software.