Data Structure (7) - Disjoint Sets

Disjoint Sets

Disjoint Sets is the data structure that supports the operation of adding one element (as their own set), [finding the label/canonical_name of one element in a set], and union 2 sets.

It can represents data like:

  • Set1 (people whose favorite color's green)
  • Set2 (people whose favorite color's blue)
  • Set3 (people whose favorite color's red)
  • ...

For set S = {S1, S2, S3, ... , Sk}:

  • find(Si) where 1<=i<=k, gives representative integer of the set.

e.g. find(Si) = min(S)

  • join(Set1, Set2): join 2 sets into one. After this operation, find(Si) where Si was in Set1, will equal to find(Sj) where Sj was in Set2.
Example: Disjoint Sets with min value as label

Key Ideas

  • Each element exists in exactly one set.
  • (Elements in sets are not ordered.)
  • (If the element is not integer, hash it to integer first.)
  • Every set is an equitant representation: 4 belongs to [0]R, 8 belongs to [0]R, then find(4)==find(8) is true (find(4) is 0, and find(8) is 0).

Build

When building disjoint sets:

  • Start with everything separated
  • Build sets by union the elements

Implementations

Array

Store label of item at array[hash(item)]

  • find: O(1)
  • union: O(n) – need to go through the array to update labels of one set

Up-Tree

Continue using an array where index is the key. Start all content in array as -1.

The value of array is:

  • -1 if we've found the representative element
  • The index of parent, if we haven't found the rep.
Example: Disjoint Sets Implementation - UpTree
  • find: O(h) where h is the height of the up-tree, and h is only bounded by n (size of the array). So we would like to keep the trees short.
  • union: O(1)

Make up-trees faster: instead of storing -1 at roots, store -[number of nodes in tree] and use smart union:

int DisjointSets::find(int i) {
    if(arr_[i]<0) return i; // this is the root
    else return find(arr_[i]);
}

void DisjointSets::union(int r1, int r2) {
    arr_[r1] = r2; // points r1 to r2
}

void DisjointSets::unionBySize(int r1, int r2) {
    // union by size: keeps the number of nodes in a tree smaller
    
    int newSize = arr_[r1] + arr_[r2];
    
    if(arr_[r1]<arr_[r2]) { // r1 has more nodes
        arr_[r2] = r1; // so only -arr_[r2] nodes gets 1 layer higher
        arr_[r1] = newSize;
    } else {
        arr_[r1] = r2;
        arr_[r2] = newSize;
    }
}

void DisjointSets::unionByHeight(int r1, int r2) {
    // union by height: keeps the height of all trees smaller
    
    if(arr_[r1]< arr_[r2]) { // r1 is heigher
        arr_[r2] = r1; // so r1 doesn't get heigher
    } else if (arr_[r1]>arr_[r2]) {
        arr_[r1] = r2;
    } else { // tree gets higher no matter what
        arr_[r1] = r2;
        arr_[r2] -= 1;
    }
}

which makes the height of the trees: O(log(n))

Path Compression

int DisjointSets::find(int i) {
    if(arr_[i]<0) return i; // this is the root
    else {
        int root = find(arr_[i]);
        arr_[i] = root;
        return root;
    }
}

which makes find O(log(n)) amortized.

If we use both smart union and path compression?

The iterated log function:

log*(n) = {

0 , n<=1

1+log*(log(n)) , n>1

}

e.g. What's lg*(265535)?

1: lg(265535) = 65535,

2: lg(65535) = 16,

3: lg(16) = 4,

4: lg(4) = 2,

5: lg(2) = 1

so lg*(265535) = 5

union_find: O(m*α(n)) or O(α(n)) amortized, where n is number of items in Disjoint Sets. For all practice inputs, it's roughly O(1)

Ref: