Union-find

Introduction

A union-find data structure (also known as a disjoint set), is used to efficiently store items in disjoint (non-overlapping) sets.

The key union-find operations are:

• make_set(x): make set with x as a member of its set.
• find(x): return the root element of x.
• union(x,y): combine the sets of x and y.

Union-find represents sets as a backwards tree, where each node has a corresponding parent. The root node defines which set the node and all its children are in.

A root node has its parent set to itself, to identify that itâ€™s the root in a set [1, P. 199].

Merging sets can be done by setting the parent of one set to the parent of another. In the case of a merge, the smallest subtree becomes the parent of the other [1, P. 199].

At most $\lg_2 n$ doublings can occur before all sets have been merged. That means find() and union() are both $O(\log n)$ operations [1, P. 201].

Representing a union-find

A union-find can be represented as an array of nodes:

typedef struct {
int p[SET_SIZE+1]; // parent element
int size[SET_SIZE+1]; // number of elements in subtree i
int n; // number of elements in set
} set_union;


The following are implementations of the key operations:


void set_union_make_set(set_union *s, int x) {
s->p[i] = i;
s->size[i] = 1;
}

void set_union_init(set_union *s, int n) {
for (int i = 1; i <= n; i++) {
set_union_make_set(s, i)
}
s->n = n;
}

int set_union_find(set_union *s, int x) {
if (s->p[x] == x) {
return(x);
}
return set_union_find(s, s->p[x]);
}

int set_union_union(set_union *s, int s1, int s2) {
int r1, r2; /* roots of sets */
r1 = find(s, s1);
r2 = find(s, s2);
if (r1 == r2) return;

if (s->size[r1] >= s->size[r2]) {
s->size[r1] = s->size[r1] + s->size[r2];
s->p[r2] = r1;
}
else {
s->size[r2] = s->size[r1] + s->size[r2];
s->p[r1] = r2;
}
}


[1, P. 200]

References

1. [1] S. Skiena, The Algorithm Design Manual, 2nd ed. Springer, 2008.