Heapsort Algorithm

title: Heapsort excerpt: What is heapsort? How is it implemented? What is the runtime of heapsort? Is heapsort an inplace sorting algorithm? What is the space complexity of heapsort?

Heapsort uses heap data structure for O(N log N) run time. This algorithm is useful for large arrays. Heap data structure is used to store a complete binary tree in an array.

Implementation

The first step in implementing heapsort is to build a heap. Repeatedly swap the first and last items in the heap and rebuild the heap excluding the last item. Every pass removes one item from the heap and adds that item to the end of the array where the items are in sorted order.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def heapsort(array)
  # Convert the array into a heap  
  size = array.size
  size.downto(0) do |i|
    # swap the root item and the last item
    array[0], array[i] = array[i], array[0]
    # consider the item in position i to be removed from the heap,
    # so the heap now holds i - 1 items. Push the new root value
    # down into the heap to restore the heap property.
  end
end

The first task is to turn the array of values into a heap. Then repeatedly remove the top item, which is the largest and move it to the end of the heap. This reduces the number of items in the heap and restores the heap property. The newly positioned item at the end of the heap is in its proper sorted order.

The items are removed from the heap in largest to smallest order and placed at the end of the ever-shrinking heap. The array now holds the values in smallest to largest order.

The data is stored inside the orginal array and the space complexity is O(N) since the array holds N values. The runtime of this algorithm must take the time needed to build the initial heap, where each item is added to a growing heap. Each item added to the heap is placed at the end of the tree and swapped with the parent until the tree is again a heap. Since the tree is a complete binary tree, it is O(log N) levels tall, so swimming the item up through the tree can take, at most, O(log N) steps. Adding an item and restoring the heap property is performed N times, so the total time to build the initial heap is O(N log N).

The sorting is performed by removing each item from the heap and then restoring the heap property. This is done by swapping the item in the heap and the root node and then swapping the new root down through the tree until the heap property is restored. The tree height is O(log N) so this can take up to O(log N) time. This step is repeated N times, so the total number of steps required is O(N log N).

The total time consists of the time needed to build the initial heap and the time to finish sorting: O(N log N) + O(N log N) = O(N log N). Heapsort is an inplace sorting algorithm that takes no extra storage. The run time of O(N log N) is asymptotically the fastest possible for an algorithm that sorts by using comparisons, the quickort algorithm usually runs slightly faster.

Practice Phase Template

Practice Phase Template for Heapsort Algorithm:

  1. Understand the problem: The goal of heapsort is to sort a list of elements. You will be using a data structure called a heap to achieve this goal. Heapsort works by first transforming the list into a max heap (a complete binary tree where each parent node is greater than or equal to its child nodes). Then, it repeatedly swaps the first number with the last, reduces the heap size by one, and sifts the new first number into its correct position in the heap. This continues until the heap is empty and the original list is sorted.

  2. Learn about heaps: Understand what a binary heap is and how it works. Know the difference between a max heap and a min heap, and how to insert elements and delete the root from a heap.

  3. Understand heapify: The heart of the heapsort algorithm is the heapify function, which moves elements around in the array to ensure that the heap property is restored after an element is removed. This involves comparing an element with its children and swapping it with one of its children if it is smaller (in a max heap).

  4. Learn about the heap construction phase: The first phase of heapsort involves building a max heap from the input data. This is usually done by running the heapify function in a loop that starts from the last parent node and works its way up to the root.

  5. Learn about the heap extraction phase: Once a max heap has been constructed, heapsort enters its second phase. This involves repeatedly swapping the root of the heap with the last element in the heap (which places the maximum element in its final sorted position), reducing the size of the heap by one, and then running heapify on the new root.

  6. Code the algorithm: Now that you understand all the steps of heapsort, try to code the algorithm yourself. This will likely involve writing helper functions to find a node’s parent and children, swap two elements, and sift a node down the heap. Then you’ll write a main function that builds the heap and then sorts it.

  7. Test your code: Test your code on various input data to make sure it correctly sorts the data. Try edge cases like an empty list or a list of identical elements.

  8. Optimize: Look for ways to make your code more efficient or cleaner. For instance, can you avoid some swaps or make fewer calls to heapify?

  9. Review: Reflect on what you’ve learned. How does heapsort compare to other sorting algorithms in terms of efficiency? When would it be a good idea to use heapsort, and when might other algorithms be a better choice?

  10. Practice: The best way to get better at coding algorithms is to practice. Try to implement heapsort from scratch again in a few days to see how much you remember. Then do it again a few weeks later. Each time, try to do it faster and make your code cleaner.

Targeted Drills in Python

Here are all the drills arranged from simplest to most complex:

  1. Drill 1 - Create an Empty Heap

    Task: Write a Python function to create an empty heap.

  2. Drill 2 - Insert into a Heap

    Task: Write a Python function that inserts an item into a heap. The function should maintain the heap property.

  3. Drill 3 - Remove Maximum from a Max Heap

    Task: Write a Python function that removes the maximum element from a max heap. The function should maintain the heap property.

  4. Drill 4 - Heapify an Array

    Task: Write a Python function that converts an array into a heap (heapify).

  5. Drill 5 - Build a HeapSort Function

    Task: Now, using all the functions you’ve built so far (insert, remove maximum, and heapify), create a function to sort an array using heapsort algorithm.

  6. Drill 6 - Testing and Debugging

    Task: Finally, spend time testing and debugging your functions. Try sorting different arrays, including edge cases like an empty array and an array with duplicate elements. See if your function correctly sorts them in increasing order.

As the complexity increases, each drill incorporates more aspects of the heapsort algorithm and requires a deeper understanding of how heaps work. Therefore, it’s important to ensure you understand each drill before moving on to the next. This progressive approach not only builds a solid understanding of heapsort but also equips you with practical Python programming skills.

Here are Python solutions for the given drills:

  1. Drill 1 - Create an Empty Heap

    Python has built-in support for heaps using the heapq module, but let’s create our own basic structure for learning purposes:

    1
    2
    
    def create_heap():
        return []
    
  2. Drill 2 - Insert into a Heap

    For this drill, we’re going to implement a simple insertion at the end of our heap (which we’re representing as a list), and then we’ll “bubble up” the inserted element to restore the heap property.

    1
    2
    3
    4
    5
    6
    7
    8
    
    def insert(heap, item):
        heap.append(item)
        i = len(heap) - 1
        parent = (i - 1) // 2
        while parent >= 0 and heap[i] > heap[parent]:
            heap[i], heap[parent] = heap[parent], heap[i]
            i = parent
            parent = (i - 1) // 2
    
  3. Drill 3 - Remove Maximum from a Max Heap

    Removing the maximum from a heap involves swapping the root with the last node, removing the last node (which is now the maximum), and then “sifting down” the swapped node to restore the heap property.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    
    def remove_max(heap):
        heap[0], heap[-1] = heap[-1], heap[0]
        max_value = heap.pop()
        sift_down(heap, 0)
        return max_value
    
    def sift_down(heap, i):
        left = 2 * i + 1
        right = 2 * i + 2
        largest = i
        if left < len(heap) and heap[left] > heap[largest]:
            largest = left
        if right < len(heap) and heap[right] > heap[largest]:
            largest = right
        if largest != i:
            heap[i], heap[largest] = heap[largest], heap[i]
            sift_down(heap, largest)
    
  4. Drill 4 - Heapify an Array

    To convert an array to a heap, we can use the sift down operation starting from the last non-leaf node to the root node.

    1
    2
    3
    
    def heapify(arr):
        for i in range(len(arr) // 2, -1, -1):
            sift_down(arr, i)
    
  5. Drill 5 - Build a HeapSort Function

    Finally, we can implement heapsort by first heapifying the input array and then repeatedly removing the maximum.

    1
    2
    3
    4
    5
    6
    7
    
    def heapsort(arr):
        heapify(arr)
        sorted_arr = []
        while arr:
            sorted_arr.append(remove_max(arr))
        sorted_arr.reverse()  # because we want the array to be sorted in ascending order
        return sorted_arr
    
  6. Drill 6 - Testing and Debugging

    Now that you have all the components of the heapsort algorithm, you can use them to sort a variety of different arrays. Test your heapsort function with different inputs to ensure it works correctly.

    1
    
    print(heapsort([4, 3, 2, 10, 12, 1, 5, 6, 15, 11]))
    

Remember, these drills are designed to teach you how to build and work with a heap data structure and to understand how the heapsort algorithm works. The Python heapq module provides a more efficient

Basics of Heaps

Drill 1 - Understand the Heap Property

Understanding the heap property is critical to using heaps. For a max heap, every parent node must be greater than or equal to its child nodes, while for a min heap, every parent node must be less than or equal to its child nodes.

Task: Draw a tree of 5 elements that satisfy the max heap property.

Drill 2 - Insert into a Heap

Insertion into a heap involves adding the element to the end of the array and then “bubbling it up” until the heap property is restored.

Task: Write a function in Python that can insert a number into a max heap.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def insert(heap, item):
    heap.append(item)
    i = len(heap) - 1
    while i > 0:
        parent = (i - 1) // 2
        if heap[parent] < heap[i]:
            heap[parent], heap[i] = heap[i], heap[parent]
        else:
            break
        i = parent

Drill 3 - Remove Maximum from a Max Heap

Removing the maximum from a max heap (which is the root node) involves replacing the root with the last element in the array, and then “bubbling it down” until the heap property is restored.

Task: Write a function in Python that can remove the maximum from a max heap and return it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def extract_max(heap):
    heap[0], heap[-1] = heap[-1], heap[0]
    max_value = heap.pop()
    i = 0
    while True:
        left = 2 * i + 1 if 2 * i + 1 < len(heap) else i
        right = 2 * i + 2 if 2 * i + 2 < len(heap) else i
        if heap[i] >= heap[left] and heap[i] >= heap[right]:
            break
        largest = left if heap[left] > heap[right] else right
        heap[i], heap[largest] = heap[largest], heap[i]
        i = largest
    return max_value

Drill 4 - Heapify an Array

Heapifying an array involves turning an unsorted array into a valid max heap.

Task: Write a function in Python that can turn an unsorted array into a max heap.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def heapify(arr):
    n = len(arr)
    for i in reversed(range(n//2)):
        sift_down(arr, i, n)
    return arr

def sift_down(heap, start, end):
    root = start
    while True:
        child = root * 2 + 1
        if child >= end:
            break
        if child + 1 < end and heap[child] < heap[child + 1]:
            child += 1
        if heap[root] < heap[child]:
            heap[root], heap[child] = heap[child], heap[root]
            root = child
        else:
            break

Drill 5 - Practice with Heap Sort

Now that you understand heaps, you can practice with heap sort, a popular sorting algorithm that uses a binary heap.

Task: Implement heap sort in Python, using the heaps you’ve learned about.

1
2
3
4
5
6
7
8
def heap_sort(arr):
    heapify(arr)
    end = len(arr) - 1
    while end > 0:
        arr[end], arr[0] = arr[0], arr[end]
        sift_down(arr, 0, end)
        end -= 1
    return arr

With these drills, you should gain a solid understanding of heaps, how to manipulate them, and how to use them in sorting algorithms. Remember to test each function thoroughly to ensure it works as expected!

Maintaining the Heap Property

Maintaining the heap property is a critical aspect of the heap data structure and it is embedded in some of the drills that were provided.

For instance:

  • In Drill 2 - Insert into a Heap, after appending a new item to the heap, the function goes into a while loop. This loop continues swapping the new item with its parent as long as it’s greater than the parent. This is done to maintain the max heap property which states that for every node i, other than the root, the value of i is at most the value of its parent.
1
2
3
4
5
6
7
8
def insert(heap, item):
    heap.append(item)
    i = len(heap) - 1
    parent = (i - 1) // 2
    while parent >= 0 and heap[i] > heap[parent]:
        heap[i], heap[parent] = heap[parent], heap[i]
        i = parent
        parent = (i - 1) // 2
  • In Drill 3 - Remove Maximum from a Max Heap, after swapping the root (max value) with the last node and removing it, we “sift down” the swapped node to restore the heap property. Here, we’re ensuring that every parent node is greater than its children.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def sift_down(heap, i):
    left = 2 * i + 1
    right = 2 * i + 2
    largest = i
    if left < len(heap) and heap[left] > heap[largest]:
        largest = left
    if right < len(heap) and heap[right] > heap[largest]:
        largest = right
    if largest != i:
        heap[i], heap[largest] = heap[largest], heap[i]
        sift_down(heap, largest)
  • Drill 4 - Heapify an Array, the ‘sift_down’ function is used starting from the last non-leaf node to the root node. This is done to transform the array into a valid max heap.

Maintaining the heap property is an essential aspect of heap operations and is implicitly included in these drills.

Building a Heap

The process of creating a heap from an unsorted array is known as “building a heap.”

Here’s how it fits into the drills:

  • Drill 4 - Heapify an Array is the drill that directly covers the concept of building a heap. Here’s the Python code snippet for that drill:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
def heapify(arr):
    n = len(arr)
    for i in range(n//2 - 1, -1, -1):
        sift_down(arr, i)

def sift_down(heap, i):
    left = 2 * i + 1
    right = 2 * i + 2
    largest = i
    if left < len(heap) and heap[left] > heap[largest]:
        largest = left
    if right < len(heap) and heap[right] > heap[largest]:
        largest = right
    if largest != i:
        heap[i], heap[largest] = heap[largest], heap[i]
        sift_down(heap, largest)

In this drill, we start from the last non-leaf node (n//2 - 1) and move up to the root node (0), and for each node, we perform the ‘sift_down’ operation. This ‘sift_down’ operation ensures that the subtree rooted at that node satisfies the heap property.

The process of building a heap is crucial in the heap sort algorithm as it’s the first step in the algorithm where we transform the unsorted array into a max heap.