Heaps

excerpt: What is a complete binary tree? What is heap data structure? How to maintain the heap structural property? How to maintain the heap property? tags: swap

Heapsort uses heap data structure for O(N log N) run time. This algorithm is useful for large arrays. Heap data structure is used to store a complete binary tree in an array.

Complete Binary Tree

A binary tree is a tree where very node is connected to, at most two children. In a complete tree all the tree’s levels are completely filled, except possibly the last level, where all the nodes are pushed to the left. The diagram shows a complete binary tree holding 12 nodes. The first three levels of the tree is full. The fourth level contain five nodes pushed to the left side of the tree.

The complete binary tree can be stored in an array. There will be no gaps between elements in the array since the nodes at the last level are pushed to the left. A simple forumula can be used to store the nodes in an array. The root is is stored at index 0. Then, for any node with index i, the children are stored at indices 2 x i + 1 and 2 x i + 2.

A node with index i has its parent at index of the floor value of (i - 1)/2. The floor will truncate the result to the next smallest integer by rounding down the value. For example, the floor of 3.9 is 3 and floor of 3 is also 3.

The diagram shows the tree shown in the first diagram stored in an array. The indices of the array is shown on top.

The value 6 is at index 4, so its children are at indices 4 x 2 + 1 = 9 and 4 x 2 + 2 = 10. The values at those indices are 5 and 12. This can be verified by looking at the first diagram.

If the index of a child is greater than the largest index in the array, the node does not have that child in the tree. For example, the value 9 has index 5, its right child has index 2 x 5 + 2 = 12, which is beyond the end of the array. The first diagram shows that 9 has no right child.

Consider the item with value 12 stored at index 10. The index of the parent is floor value of (10-1)/2 (floor of 4.5) = 4. The value at index 4 is 6. The tree diagram shows that the node with value 12 has a parent node with value 6.

Heaps

The heap shown in the diagram below is a complete binary tree where every node holds a value that is at least as large as the values in all its children. The first diagram is not a heap because it does not maintain the heap property. The root node has a value of 7 and its right has a value of 10 which is greater.

A heap can be built one node at a time. Start with a single node, because it has no children, it satisfies the heap property. A new node is added to the end of the tree. There is only one place where a node can be added to keep the tree a complete binary tree - at the left most side of the bottom level of the tree.

Compare the new value to the value of its parent. If the new value is larger than the parent’s value, swap them. The top of the heap now has the largest value. This preserves the heap property at this point. However, changing the value of the parent node might break the heap property farther up in the tree. Move up the tree to the parent node and compare its value to the value of its parent, swapping their values if necessary. Continue up the tree, swapping values if necessary, until you reach the root node. At that poin the tree is again a heap.

The diagram shows this process when 12 is added to the tree. The new heap is shown in the next diagram.

Array makes this process easy because adding a new item to the end of the tree means it is already in the proper position in the array. The code below shows how to convert an array into a heap.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
def build_heap(array)
  # add each item to the heap one at a time
  for i in (0..array.size-1)
    index = i
    while index != 0
      # Find the parent's index
      parent = ((index - 1)/2).floor
      # if child <= parent, we are done
      # so, break out of the while loop
      if array[index] <= array[parent]
        break
      end
      
      # swap the parent and child
      array[parent], array[index] = array[index], array[parent]
      
      # move to the parent
      index = parent
    end
  end
end

Heaps are useful for creating priority queues because the largest item in the tree is always at the root node. The item at the root is removed to extract the maximum value from the queue. This breaks the heap by splitting them into two trees. The last item in the tree can be moved to the root to make it one tree. The structural property is now satisfied but this breaks the heap property.

This can be fixed by using the same method used to build the heap. If the new value is smaller than one of its child values, swap it with the larger child. That fixes the heap property at this node, but it might break it at the child’s level, so move down to that node and repeat the process. Continue swapping the node down into the tree until the heap property is satisifed or the bottom of the tree is reached.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def extract_max(array)
  size = array.size
  # Save the top item to return later
  result = array[0]
  
  # Move the last item to the root
  array[0] = array[size - 1]
  
  # Restore the heap property
  index = 0
  
  while true
    # Find the child indices
    left = 2 * index + 1
    right = 2 * index + 1
    
    # If a child index is off the end of the tree
    # use the parent's index
    left = index if left >= size
    right = index if right >= size
    
    # If the heap property is satisfied,
    # we are done, so break out of the while loop
    if array[index] >= array[left] && array[index] >= array[right]
      break
    end
    
    # Get the index of the child with the larger value
    if array[left] > array[right]
      swap_child = left
    else
      swap_child = right
    end
    
    # Swap with the larger child
    array[index], array[swap_child] = array[swap_child], array[index]
    
    # Move to the child node
    index = swap_child
  end

  return result  
end

The extract_max method finds the size of the tree from the size of the array parameter. This can be used to find the location where the heap ends within the array.

The value of the root is saved in the result variable, this is returned at the end of the method. The last item in the tree is moved to the root node. The index variable is set to the index of the root node. The infinite while loop does the repeated work.

Inside the while loop, the indices of the children of the current node is calculated. If either of those indices is off the end of the tree, it is set to the current node’s index. In that case, when the node’s values are compared later, the current node’s value is compared to itself. Because any value is greater than or equal to itself, that comparison satisfies the heap property, so the missing node does not make the algorithm swap values.

After caculating the child indices, we check whether the heap property is satisifed at this point. If it is, we break out of the while loop. If both child nodes are missing or if one is missing and the other satisifes the heap property, the while loop ends.

If the heap property is not satisfied, the swap_child is set to the index of the child that holds the larger value and swaps the parent node’s value with the child node’s value. The index variable is then updated to move down to the swapped child node and we continue to move down the tree.