Decision Trees

excerpt: Decision trees are used to model problems that are solved by making a series of decisions. tags: decision-trees

Decision trees are used to model problems that are solved by making a series of decisions. Each branch in the tree represents a single choice. A leaf node represents a complete set of decisions that produces a final solution. The goal is to find the best possible set of choices. This is the best leaf node in the tree.

Searching Decision Trees

In the partition problem, there is a collection of objects of a given weight and the goal is to divide them into two groups that have the same total weight. For small number of objects, this problem can be solved manually. It becomes hard to solve for a large number of objects with different weights.

A binary decision tree can model the process of deciding which objects go in which group. A left branch represents putting the object in the first group and a right branch represents putting the objects in the second group.

The diagram shows the complete decision tree for a partition problem with four objects having weights 2, 4, 1 and 1. A path through the tree represents a complete assignment of objects to the two groups. For example, the path that follows the root’s left branch and then the next three right branches puts the first object in the first group and the other objects in the second group. The numbers below the tree show the total weights of the two groups - in this case 2 and 6.

Only two leaf nodes correspond to dividing the objects’ weights equally so that both groups have a total weight of 4. The two solutions are the same solution with the objects in the two groups switched.

The time complexity is O(2n) where n is the number of objects. The general formula is O(bn) where b is the branching factor. In this case, the branching factor is 2.

Optimization Problems

Partition problems have two different forms. The first form is finding if a particular solution is possible. The second form is finding the best possible solution.

The first form of partition problem is decision about whether the objects can be divided into two groups with equal total weights. This requires a yes-or-no answer. The second form is about dividing the objects into two groups with total weights as close to equal as possible. The second form is an optimization problem because objects can be divided in many ways and the goal is to find the optimum division.

Exhaustive search visits all the nodes in a decision tree searching for the best solution. Tree data structure is not used to search. We need a way to keep track of where we are in the tree. Recursion can keep track of their positions in the tree and pick different branches of the tree.

This method is slow since the entire decision tree has to be searched. We can make one improvement to shorten the search. If we reach leaf node where we find one of the solution, we can stop without searching the rest of the decision tree. If there are many optimal solutions, this can speed up the search to find a solution.

Branch and Bound

Branch and bound is a technique for searching trees more effectively. After moving down a branch in the tree, the bounding function calculates the best possible outcome that can be achieved down that branch. If the best possible outcome will not be an improvement over the best solution that has already been found, that branch is abondoned and the processing does not continue down its subtree. This can save lot of time.

This is similar to the exhaustive search except that it determines whether the test solution can be improved enough to beat the current best solution. It can trim many branches and their subtrees from the decision tree, so it can be faster than the exhaustive search.

Branch and bound searches any path through the tree that might lead to a solution better than the best solution found so far. This means, like exhaustive search, it always finds the optimal solution. The decision trees can be enormous, so branch and bound can still be fairly slow.

Generalized Partition Problem

In the partition problem, teh goal is to divide a set of objects into two groups with equal weight. In the generalized partition problem, the goal is to divide a set of objects into K groups with equal weight.

The decision tree for this problem has K branches at each node corresponding to putting the item at that level of the tree into one of the K different partitions. For N items, the tree height is N levels, so it contains KN leaf nodes. The same heuristics that work for the partition problem also work for the generalized partition problem, although they are more complicated.

Subset Sum

In the subset sum problem, given a set of numbers, determine whether there is a subset whose sum is 0. For example, consider the set {-11, -7, -5, -3, 4, 6, 9, 12, 14}. This set has the zero-sum subset {-7, -5, -3, 6, 9}. A related optimization version of the problem is to find a subset with a sum close to 0.

This problem can be modeled with a decision tree. The items can be divided into two groups - one that holds objects to go into the zero-sum set and one that holds objects that will be discarded.

Similar to the partition problem, for N items, the tree has N levels and each node has two branches - one corresponding to adding an item to the zero-sum set and one corresponding to discarding the item, so the tree has 2N leaf nodes. Branch and bound can be used on the optimization version of this problem but not on the nonoptimization version.

Knapsack

In the knapsack problem, given a set of objects that each have a weight and a value, and a knapsack that holds a certain amount of weight. The goal is to find the items with the maximum value that will fit in the knapsack.

The knapsack problem is similar to the partition problem, in which we divide the items into two grops - one with items to go into the knapsack and one with items to remain outside. In the knapsack problem, the goal is to make the first group as valuable as possible and also ensure it fits inside the knapsack.

The knapsack’s basic structure is similar to that of the partition problem, a decision tree can be used. The differences are the goal and that not all assignments are legal due to the weight constraint. Branch and bound can be used to stop examining a partial solution if its total weight already exceeded the knapsack’s capacity.

Branch and bound can also stop if the total value of the unconsidered item was insufficient to raise the current solution’s value enough to beat the best solution found so far. The decision tree for a knapsack problem with N items contains 2N leaf nodes.

Decision trees are a powerful tool for approaching complex problems. Just like any tool they are not the best approach for every problem. For example, 8 Queens problem can be modeled using decision tree but that would lead to a huge tree. However most most of the tree’s branches can be pruned by narrowing the domains of the variables.

Thinking about the problem as a general decision tree may be a mistake, because it might make you miss the simplifications that let you solve the problem efficiently.

Still, decision trees are a powerful technique that should be considered if a better approach to a problem is not unknown.