Optimal Binary Search Tree

An optimal binary search tree (OBST) is a binary search tree that minimizes the total search cost given the frequency of searching for each key. The optimal BST balances two factors:

  • The tree should be balanced and shallow to minimize search time.

  • Keys that are searched for more frequently should be higher up.

An OBST can be constructed using dynamic programming. The optimal structure depends on the given search frequencies.

Applications of OBST include:

  • Efficient searching in databases
  • Indexing data for quick retrieval
  • Data compression algorithms

By optimizing for search cost, OBST improves performance compared to standard BSTs.

Example in Java:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Keys and search frequencies
double[] keys = {10, 20, 30, 40, 50};
double[] freq = {0.1, 0.2, 0.4, 0.2, 0.1}; 

OptimalBST bst = new OptimalBST(keys, freq);

// Construct optimal BST
bst.construct();

// Searches are efficient
bst.search(30);

Example in C++:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Keys and frequencies
vector<int> keys = {10, 15, 20, 25, 30};
vector<double> freq = {0.15, 0.1, 0.2, 0.3, 0.25};

OptimalBST bst(keys, freq); 

// Build optimal BST
bst.build();  

// Fast searching 
bst.search(15);

Example in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Keys and frequencies 
keys = [10, 20, 30, 40, 50]
freq = [0.1, 0.2, 0.4, 0.2, 0.1]  

bst = OptimalBST(keys, freq)

# Construct OBST
bst.construct()  

# Efficient searching
bst.search(40)

In summary, an optimal BST efficiently organizes keys to minimize total search cost based on frequencies. It balances depth vs frequency.

An Optimal Binary Search Tree (OBST) is a binary search tree for which the nodes are arranged in a way that minimizes the total access time or cost. Given a set of keys and their probabilities of being accessed, the OBST aims to create a tree with minimum expected search time. The concept often utilizes dynamic programming to find the optimal solution.

Java Code for Optimal Binary Search Tree

Here’s a Java implementation of building an OBST:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
public class OptimalBST {
    public static double optimalBST(double freq[], int n) {
        double[][] cost = new double[n + 1][n + 1];
        for (int i = 0; i <= n; i++) {
            cost[i][i] = freq[i];
        }
        for (int l = 2; l <= n; l++) {
            for (int i = 0; i <= n - l + 1; i++) {
                int j = i + l - 1;
                cost[i][j] = Double.MAX_VALUE;
                for (int r = i; r <= j; r++) {
                    double c = ((r > i) ? cost[i][r - 1] : 0) + ((r < j) ? cost[r + 1][j] : 0) + freq[i];
                    if (c < cost[i][j]) {
                        cost[i][j] = c;
                    }
                }
            }
        }
        return cost[0][n - 1];
    }
}

C++ Code for Optimal Binary Search Tree

Here’s a C++ code snippet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <algorithm>
#include <iostream>
#include <vector>
using namespace std;

double optimalBST(vector<double> freq, int n) {
    vector<vector<double>> cost(n + 1, vector<double>(n + 1, 0));
    for (int i = 0; i < n; ++i) {
        cost[i][i] = freq[i];
    }
    for (int l = 2; l <= n; ++l) {
        for (int i = 0; i <= n - l + 1; ++i) {
            int j = i + l - 1;
            cost[i][j] = DBL_MAX;
            for (int r = i; r <= j; ++r) {
                double c = ((r > i) ? cost[i][r - 1] : 0) + ((r < j) ? cost[r + 1][j] : 0) + freq[i];
                cost[i][j] = min(cost[i][j], c);
            }
        }
    }
    return cost[0][n - 1];
}

Python Code for Optimal Binary Search Tree

Here’s how you’d write it in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def optimal_bst(freq, n):
    cost = [[0 for _ in range(n + 1)] for _ in range(n + 1)]
    for i in range(n):
        cost[i][i] = freq[i]
    for l in range(2, n + 1):
        for i in range(n - l + 1):
            j = i + l - 1
            cost[i][j] = float('inf')
            for r in range(i, j + 1):
                c = (cost[i][r - 1] if r > i else 0) + (cost[r + 1][j] if r < j else 0) + freq[i]
                cost[i][j] = min(cost[i][j], c)
    return cost[0][n - 1]

These implementations use a 2D array cost[][] to store the cost of subproblems. It is populated based on the frequency array to find the minimum expected search time for the binary search tree.