Order Statistic Tree

An order statistic tree augments a binary search tree to efficiently support order statistic queries like finding the kth smallest element.

Additional fields store subtree node counts. Updating counts on modifications allows O(log n) order statistic queries.

Java example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Node {
  int key;
  int size; // num nodes in subtree
  Node left, right;

  public Node(int key) {
    this.key = key;
    size = 1;
  } 
}

int kthSmallest(Node root, int k) {
  int leftSize = (root.left != null) ? root.left.size : 0;

  if (k <= leftSize) {
    return kthSmallest(root.left, k);
  } else if (k > leftSize + 1) {
    return kthSmallest(root.right, k - leftSize - 1); 
  }

  // k == leftSize + 1
  return root.key;
} 

C++ example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
struct Node {
  int key;
  int size;
  Node *left, *right;

  Node(int key) : key(key) {
    size = 1;
  }
};

int kthSmallest(Node* root, int k) {

  if (root == NULL) return INT_MAX;
  
  int leftSize = root->left != NULL ? root->left->size : 0;

  if (k <= leftSize) {
    return kthSmallest(root->left, k);
  } else if (k > leftSize + 1) {  
    return kthSmallest(root->right, k - leftSize - 1);
  }

  // k == leftSize + 1
  return root->key;
}

Python example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Node:

  def __init__(self, key):
    self.key = key
    self.size = 1
    self.left = None
    self.right = None

def kth_smallest(root, k):

  if root is None:
    return

  left_size = root.left.size if root.left else 0

  if k <= left_size:
    return kth_smallest(root.left, k)
  elif k > left_size + 1:
    return kth_smallest(root.right, k - left_size - 1)
  
  # k == left_size + 1
  return root.key

Order statistic trees allow fast rank-based queries on tree data structures.

An order statistic tree (OST) is a binary search tree that supports the following operations:

  • insert(x): Inserts the element x into the tree.
  • getRank(x): Returns the number of elements in the tree that are less than or equal to x.
  • select(k): Returns the element with rank k in the tree.

The order statistic tree can be used to solve a variety of problems, such as: Finding the median of a stream of numbers. Finding the kth smallest element in a list. Counting the number of elements in a range.

The order statistic tree can be implemented in a variety of ways. One common implementation is to use a balanced binary search tree. This ensures that all operations can be performed in time O(log n), where n is the number of elements in the tree.

Another common implementation is to use a splay tree. This ensures that the most recently accessed elements are moved to the top of the tree. This can improve the performance of operations such as getRank() and select(). The order statistic tree is a powerful data structure that can be used to solve a variety of problems.

An Order Statistic Tree is a modified binary search tree that allows queries for the rank of a given element, as well as the ability to access an element based on its rank. Each node in the tree stores additional information: the size of the subtree rooted at that node. This extra attribute allows the tree to answer rank-based queries in O(log n) time, where n is the number of elements in the tree. The key takeaway is that Order Statistic Trees combine the best properties of both binary search trees and arrays to offer efficient rank-based operations.

Java Code for Order Statistic Trees

Here’s a simplified Java class that implements an Order Statistic Tree:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
public class OSTNode {
    int key, size;
    OSTNode left, right;

    public OSTNode(int key) {
        this.key = key;
        this.size = 1;
        this.left = null;
        this.right = null;
    }
}

public class OrderStatisticTree {
    OSTNode root;

    // Update size attribute
    private int size(OSTNode node) {
        return (node == null) ? 0 : node.size;
    }

    // Insert function, also updates the size
    public OSTNode insert(OSTNode root, int key) {
        if (root == null) return new OSTNode(key);

        if (key < root.key) root.left = insert(root.left, key);
        else root.right = insert(root.right, key);

        root.size = size(root.left) + size(root.right) + 1;
        return root;
    }

    // Find the kth smallest element
    public int kthSmallest(OSTNode root, int k) {
        int rank = size(root.left) + 1;
        
        if (rank == k) return root.key;
        if (rank > k) return kthSmallest(root.left, k);
        return kthSmallest(root.right, k - rank);
    }
}
  1. OSTNode is the structure of each node, holding a key, size, left, and right.
  2. insert(OSTNode root, int key) adds a node to the tree and updates sizes.
  3. kthSmallest(OSTNode root, int k) returns the kth smallest element.

C++ Code for Order Statistic Trees

In C++, we can represent an Order Statistic Tree like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
struct OSTNode {
    int key, size;
    OSTNode *left, *right;
};

class OrderStatisticTree {
public:
    OSTNode* root;

    int size(OSTNode* node) {
        return (node == nullptr) ? 0 : node->size;
    }

    OSTNode* insert(OSTNode* root, int key) {
        if (!root) return new OSTNode{key, 1, nullptr, nullptr};

        if (key < root->key) root->left = insert(root->left, key);
        else root->right = insert(root->right, key);

        root->size = size(root->left) + size(root->right) + 1;
        return root;
    }

    int kthSmallest(OSTNode* root, int k) {
        int rank = size(root->left) + 1;

        if (rank == k) return root->key;
        if (rank > k) return kthSmallest(root->left, k);
        return kthSmallest(root->right, k - rank);
    }
};
  1. insert(OSTNode* root, int key) and kthSmallest(OSTNode* root, int k) are similar to their Java counterparts but use C++ syntax.

Python Code for Order Statistic Trees

Here’s how an Order Statistic Tree can be implemented in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class OSTNode:
    def __init__(self, key):
        self.key = key
        self.size = 1
        self.left = None
        self.right = None

class OrderStatisticTree:
    def __init__(self):
        self.root = None

    def size(self, node):
        return 0 if node is None else node.size

    def insert(self, root, key):
        if root is None:
            return OSTNode(key)
        
        if key < root.key:
            root.left = self.insert(root.left, key)
        else:
            root.right = self.insert(root.right, key)

        root.size = self.size(root.left) + self.size(root.right) + 1
        return root

    def kthSmallest(self, root, k):
        rank = self.size(root.left) + 1

        if rank == k:
            return root.key
        if rank > k:
            return self.kthSmallest(root.left, k)
        return self.kthSmallest(root.right, k - rank)
  1. The OSTNode class defines the node structure.
  2. The OrderStatisticTree class includes methods to insert nodes and find the kth smallest element, similar to Java and C++ versions.

All these implementations enable the Order Statistic Tree to support dynamic rank-based queries efficiently.