Data Structure (3) - BST & AVL Tree

Binary Search Tree

A BST (binary search tree) is a tree T such:

  • T = {} or
  • T = {r, TL, TR} where all nodes in TL has keys less than r, all nodes in TR has keys greater than r, and TL and TR are BSTs.
// BST.h
#pragma once

template <typename K, typename V>
class BST {
    public:
        BST();
        void insert(const K & key, const V & value);
        void remove(const K & key);
        K find(const K & key) const;
        
    private:
        struct TreeNode {
            TreeNode *left, * right;
            K key;
            V value;
            TreeNode(const K & k, const V & v): key(k), value(v), left(NULL), right(NULL){}
        }
        
        TreeNode *root_;
    
}


// cpp
template <typename K, typename V>
V BST::find(const K & key) {
    TreeNode * node = _find(root_,key);
    if(node) {
        return node->value;
    } else {
        // deal with null case
        return V();
    }
}


template <typename K, typename V>
TreeNode *& _find(TreeNode *&node, const K & key) {
// returns a pointer reference of a treenode
// so if we want to find then update/insert, we can do it

    if(node == nullptr) { return node;} // can't find the key
    if(node->k==key) { // find exactly the key
        return root;
    } else {
        if(node->k > key) {
            return _find(node->left, key);
        } else {
            return _find(node->right, key);
        }
    }
}


template <typename K, typename V>
void BST::insert(const K & key, const V & value) {
	_insert(root_, key, value);
}

template <typename K, typename V>
_insert(TreeNode *& root, const K & key, const V & value) {
    TreeNode *& loc = _find(root,key);
    if(loc == nullptr) {
        loc = new TreeNode(key, value);
    }
}

BST Remove:

IOP (In-order predecessor): left tree right most child. Largest node that is less than root.

IOS (In-order successor): right tree left most child. Smallest node that is larger than node.

When removing a node in BST:

  • find the key
  • find IOP or IOS
  • swap node with key and IOP/IOS
  • delete the node with key (it's 0-child delete now)

BST Worst Cases:

  • find: O(h)
  • insert: O(h)
  • remove: O(h)   —> O(h)+O(h)  (find the node to remove, and find it's IOP/IOS)
  • traversal: O(n)

For all BST, Lower bound: O(lg(n)) (when it is balanced), Upper bound: O(n) (when it is like a stick)

For all BST with n nodes (randomly sequenced), the average height of the trees is lg(n)

A balanced BST (O(h) for find, insert, delete, where h=lg(n), and O(n) for traverse) is a better choice for dictionary than sorted array (O(n) for insert, delete, traverse, O(lg(n) for find) and sorted list (O(n) for find, insert, delete, traverse).

Height Balanced Tree

Height Balance: b = height(TR)-height(TL)

A tree T is height balanced if :

  • T = {}
  • T = {r, TL, TR}, |b|<=1, and TL and TR are balanced trees

A complete tree is always balanced.

Examples: balanced trees and imbalanced trees

BST Rotation

  • 4 kinds of rotations (L, R, LR, RL) (LR & RL for elbows)
  • All rotations are local (subtrees are not impacted)
  • All rotations are constant time: O(1)
  • BST property maintained

Goal: produce trees of height h=O(log(n)) using only O(log(n)) time for each operation.

BST Rotation
AVL Rotations (L, R, RL, LR)

We call these trees AVL (Adelson-Velokii-Landis) trees.

AVL Tree

Three issues for consideration:

  • Rotations: to fix imbalance
  • Maintaining height: track height of each node
  • Detect imbalance: when a node become out of balance

When a new node is inserted, only when the tree becomes taller, it becomes imbalanced.

Insert into an AVL tree:

  • Insert at proper place – O(h)
  • Check for imbalance – O(1)
  • Rotate if necessary – O(1)
  • Update height
struct TreeNode {
    T key;
    unsigned height;
    TreeNode *left;
    TreeNode *right;
}


template <class T>
void AVLTree::_insert(const T &x, TreeNode<T> *& t) {
    if(t==NULL) { // base case. insert into empty tree
        t = new TreeNode(x,0,NULL,NULL);
    }
    else if(x < t->key) { // insert into left sub-tree
        _insert(x, t->left);
        int balance = height(t->right) - height(t->left);
        // helper func to handle when t doesn't have a left or right child
        int leftBalance = height(t->left->right) - height(t->left->left);
        
        if(balance == 2) { // out of balance
        	if(leftBalance == -1) { // a stick
            	rotateR(t);
            }
            else { // elbow
            	rotateLR(t);
            }
        }
    }
    else if(x > t->key) { // insert into right sub-tree
        _insert(x, t->right);
        int balance = height(t->right) - height(t->left);
        int rightBalance = height(t->right->right) - height(t->right->left);
        if(balance == 2) { // out of balance
            if(rightBalance == 1) { // a stick
            	rotateL(t);
            } else { // elbow
                rotateRL(t)l
            }
        }
    } 
    
    t->height = 1 + max(height(t->left), height(t->right));
}

Analysis

Find runs at O(h)

Insert runs at O(h) find + O(1) add + O(1) rebalance, total O(h)

Remove runs at O(h) find + O(h) find IOP/IOS + O(1) remove + h*O(1) rebalance, total O(h)

Where h is O(log(n)).

Minimum number of nodes

Let N(h) be smallest number of nodes in an AVL tree of height h:

N(-1) = 0,

N(0) = 1,

N(1) = 2,

...

N(h) = 1 + N(h-1) + N(h-2),

which is, root + a sub tree of (h-1) and a sub tree of (h-2), balance = 1 or -1

So, N(h) > N(h-1) + N(h-2) >= 2*N(h-2) >= 2h/2  (when h>=1)

An AVL tree of height h has at least 2h/2 nodes.

Invert it:

n >= N(h) > 2h/2,

n>= 2h/2

log(n) >= h/2

h<= 2 log(n) — where log(n) > 0

So for an AVL tree, h=O(log(n))

Summary of Balanced BST Trees

AVL

  • Max height: 1.44 * lg(n)
  • Rotations:
  • find – 0 rotation
  • insert – 1 rotation at most (LR, RL count as 1)
  • remove – h rotations at most, where h is O(log(n))

Red-Black Tree

  • Max height: 2 * lg(n)
  • Constant numbers of rotations:
  • insert – 2 rotations at most
  • remove – 3 rotations at most
  • Works better with modern hardware (more caches and predicters)
  • (C++'s standard map is Red-Black tree)
// map API
map<string,int> grades;
//....
grades["someone"] = 100;
cout << "Someone's grade is: " << grades["someone"] << endl;
grades.erase("someone");

Pros of Cons of BST

Pros

Running time O(log(n)), improvement over lists and arrays as dictionary.

Good for:

  • Approximate find: Each step (the sub-tree) is closer to target
  • Range and nearest neighbor find

Cons

  • Never O(1)
  • In-memory requirement: following pointer is O(1) and fast -> assume everything is in memory (if it points to resources elsewhere, like on another computer or database over Internet, it could be slow)

ref: https://courses.engr.illinois.edu/cs225/fa2020/