Binary Search Trees

excerpt: This article covers building a binary search tree and its time complexity. tags: two-way-comparison

A binary search tree is a data structure used for storing data. The data in each node of the binary tree is a pair consisting of a key and item. It allows us to carry out operations such as search, insert and remove elements efficiently.

The binary search tree is arranged so that an inorder traversal processes them in sorted order. Each node’s value is larger than the value of its left child and less than or equal to the value of its right child. This is binary search tree property. It implies that every subtree of a binary search tree is also a binary search tree. This property holds for every node of the binary search tree.

The diagram shows a binary search tree. A node’s value lies between the values of its left child and its right child.

Binary Search Tree ADT

We can define the operations of a BST without knowing how they are implemented. The operations are:

find(key) - Returns the node if it is found otherwise null find_min - Returns the node with the smallest key find_max - Returns the node with max value add(key) - Add a node with the given key value to the binary search tree remove(key) - Remove a node with the given key value in the binary search tree

Add Nodes

A node with a given key k must be inserted in a binary search tree such that the resulting tree also satisfies the bst property. A binary search tree can be built by adding nodes to the tree. To add a value to a node’s subtree, compare the new value to the node’s value and recursively move down the left or right branch as appropriate. When a missing branch is found, add the new value.

The size of the problem is the height of the tree. The simplest base case occurs when the tree is empty. In this case, there is only one node in the tree that does not have any children. The logic used for search will be used for solving this problem.

There are two possible scenarios that lead to a base case and a recursive case. If the root node does not have a left subtree, then simply replace the associated empty list by the node containing the key and the item (and two empty subtees). However, if the root node does have a subtree then the method must insert the new node in that subtree, which leads to a recursive call. The reasoning for the right subtree is similar.

If the tree is not empty, then a node is inserted in the left subtree if k is less than the root value. Otherwise, it will be inserted in the right subtree.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Add a node to this node's sorted subtree
def add(key)
  # Check if this value is smaller than this node
  if key < value
    # The new value is smaller. Add it to the left subtree
    if left.nil?
      left = Node.new(key)
    else
      left.add(key)
    end
  else
    # The new value is greater than the value in this node.
    # Add it to the right subtree
    if right.nil?
      right = Node.new(key)
    else
      right.add(key)
    end
  end
end

This method is defined in the node class and the value property holds the node’s data. The new value is compared to the node’s value. If the new value is smaller than the node’s value, the new value is placed in the left subtree. If the left child is missing, the current node gets a new left child with the new value. If the node has a left child, recursively call the add method on the left child to place the new value in the left subtree.

If the new value is greater or equal than the node’s value, the new value is added in the right subtree. If the right child is missing, the current node gets a new right child and places the new value there. If the right child exists, the right child node’s add method is recursively called to place the new value in the right subtree.

A sentinel can be used at the top of the tree. The root node’s value can be set to something smaller than any possible value the tree might need to contain, nodes can be added to the tree without worrying about whether it is empty. All the nodes you add end up in the right subtree below the root.

Runtime Analysis

The run time depends on the structure of the tree. If the tree grows relatively short and wide, adding N nodes to the tree, it has height O(log N). When you add an item to the tree, you must search to the bottom of the tree and that takes O(log N) steps. Adding N nodes at O(log N) steps each makes the total time to build the tree O(N log N).

Roughly half of the nodes in a perfect binary tree are at the bottom of the tree. This means after adding half of the nodes of the tree, last level of the tree remains to be built, so it already has height log N - 1. The remaining half of the nodes must be added at a depth of log(N) - 1, so the total number of steps is N / 2 x log(N) - 1 = O(N log N).

If the values in the tree are initially randomly ordered, it results in a wide tree. However, if the value is added in some order, a tall thin tree will result. In the worst case, if you add the values in sorted or reverse sorted order, every node has a single child and the tree will have N nodes with height N.

In that case, after adding N / 2 nodes, the tree has height N / 2. So it will take more than N / 2 steps to add the remaining N / 2 nodes. The total run time is O(N/2 x N/2) = O(N2).

The add method can be used to implementing sorting algorithm called treesort. In the treesort algorithm, the node method is used to add values to a binary search tree. Then an inorder traversal can be used to output the items in sorted order. The add method takes O(N log N) and the inorder traversal takes O(N) time, so the total run time is O(N log N + N) = O(N log N). The worst case run time is O(N2). Because the worst case for building the tree is O(N2) and the inorder traversal is O(N) time, giving a total run time of O(N2 + N) = O(N2).