Minimum Spanning Tree - Kruskals

Problem Definition

Given a graph G = (V, E), find T, an acyclic subset of E, that connects all the vertices and whose total weight is minimized.
(This all assumes that we're given a graph G = (V, E) and a weight function w : e → R)

Example:  You're connecting houses with telephone wire.  You want to find the minimum amount of wire needed to connect all the houses.  We don't need to connect the houses into circles ( the wiring can be acyclic), we want to connect every house to the phone system (we want to connect all the vertices), and we want to use the least amount of wire possible (we want to minimize the weight of the edges).

Notes about the algorithm

We're going to produce one (or more) "trees".  The tree (or trees) will "span" the graph meaning that they will connect every single vertex.

Because we're going to find the tree(s) with the minimum total weight we will find the minimum spanning tree.

We're going to find a solution using a Greedy algorithm - one that makes decisions based on whatever looks best at the moment.  In this case our algorithm will produce a globally optimal result, but this isn't always the case.

Intuitive Overview of the Algorithm

TODO: Add the thing about 'merging drops of mercury'

TODO: (Maybe find a diagram / graph to illustrate this with?

We're going to represent each 'drop of mercury / vertices' as a set, so let's look at a new A.D.T.: the set

New Abstract Data Type: Set

A set is a collection of things (objects) that have no duplicates.
(In contrast, a bag is a collection of things that allows duplicates).

Examples of sets:

Examples of bags (NOT sets):

Sets are a common mathematical 'thing', particularly in discrete math courses. 

How sets store objects

In Kruskal's algorithm we need to be able to quickly find which set an object belongs to.

If the objects that we're storing cannot be customized for our set class, then we could use a hash table to store them efficiently.  If we know how many vertices there are then we can choose a size for the hash table such that we can expect O( 1 ) running time.

However, if the objects that we're storing can be modified/designed for this use then we could add a 'containing set' field to each object.  Then, once we've got an object, it will be a very quick O( 1 ) operation to find it's containing set.

The union operation on a set

We're going to make use of the 'union' operation, which will merge two sets together.  Let's look at some quick pseudocode for combining one set into another:

// NOTE: This will change the set that it's called on
// (by adding all the elements of otherSet to this set)
Set::Union( Set otherSet )
    foreach item in the otherSet
        add item to this set

Note that this will change the set we call it on.  For our purpses this is fine, but in other situations you might prefer a different version.  Specifically, an alterate way of doing this would be to have the method create a new set, then add everything from both this set and the otherSet to that new set, and then return that new set.  That version would be less efficient (the new set will require memory, and it will take time to put everything into that set) but would be safer, in the sense that you won't accidentally modify the set you're currently working with.

Problem: Given G = (V, E) and src ∈V, find a shortest path p (from s to

Pseudocode for Kruskal's Algorithm

Running time: O( E log E )

Kruskals( Graph G)
    Set MST = new Set() // starts out empty
                        // MST will contain the minimum spanning tree when we're done.
                       
    SetCollection allSets = new SetCollection()
       // allSets is a set of all the sets
       // I named it SetCollection just to make it easier to keep this straight
       //     - we'd probably use another Set in real code
       // Set and SetCollection provide add, find, remove
    foreach Vertex v in G.V:
        add v to it's own, unique set (call it singletonSet)
        add singletonSet to allSets
    PriorityQueue edges = new PriorityQueue()
        // This is a min heap, based on the weight of the edge
    foreach Edge e in G.E:
        add e to edges
    edges.BuildHeap()

    while edges is not empty:
        Edge e = edges.getMin() // assume this removes e and re-heapifies
        Set from = allSets.findSetContaining( e.start )
        Set to = allSets.findSetContaining( e.end )
        if from != to:
            MST.add( e ) // this edge is part of the min. spanning tree
            // merge the two sets into a single set
            allSets.remove(from)
            allSets.remove(to)
            from.union(to) // add the vertices in 'to' to the 'from' set
            allSets.add( from )