Algorithms And Data Structures

09/23/98

Algorithms	Analysis
Compression	Huffman Encoding
Graph	Dijkstra's Shortest Path Minimum Spanning Tree
Search	Binary Search
Sort (1 Dimension)	Bubble Sort Bucket Sort Counting Sort Heap Sort Insertion Sort Merge Sort Quick Sort Radix Sort Selection Sort
Sort (Multi-Dimension)	Geometric Hashing
Tree Searching	Breadth First Search Depth First Search
Data Structures (Abstract Data Types)	Dequeue Disjoint Set Hash Table Binary Heap Priority Queue Queue Skip Lists Stack Trees: Binary Tree Splay Tree B Tree B+ Tree 2-3-4 Tree

Analysis:
What we're really concerned with is the 'order of growth' -- an algorithm that takes Xn2 +Yn time will take significantly longer
than an algorithm that takes Zn time, almost no matter what Z and Y are. Certainly, as n gets larger this is true.
Theta = worst case analysis

Problem: Array Searching
Input: A sorted array A of n values, possibly duplicates, a value to locate within the array.
Output: The location of the value within the array, or VALUE_NOT_FOUND if the value isn't found

Binary Search Divide-and-Conquer Search
Space: O(1)
Time: O( log₂(n) )
BinarySearch( Array A, int value, int start, int end)
{
    //base case
    if( start = = end )
    {
        if( A[start] = = value )
            return start;
        else
            return VALUE_NOT_FOUND;
    }
    //inductive case
    int middle = floor( (start+end) / 2 );
    if( value >= A[middle] )
        BinarySearch( A, value, middle, end );
    else
        BinarySearch( A, value, start, middle );
}

Problem: Compression
Input: An array of numbers, A = {A1, ... , An}.
Output: An array of numbers, O = {O1, ..., On}, such that the output array O requires less space to store.
Notes:There are many ways to compress data, depending on whether you want to compress lots of data on the fly (ex: a video phone for the Internet), want really good compression (WinRAR).

Huffman Encoding: Greedy Algorithm for an Optimal, Variable Length Code
Space:
Time:

When designing a binary character code, we can use either a fixed length code, like ASCII, where every character requires the same amount of space, or a variable length code, where different characters require different amounts of space. We can take advantage of variable length codes to assign frequently used characters shorter codes, and thus save space. If we convert a document that uses a fixed-length code into one that uses a variable-length code that has been optimized for this document, we can often compress the document efficiently. The Huffman code is a prefix code, meaning that no codeword is a prefix of some other code word. It can be proven that prefix codes are capable of achieving optimal compression.
The Huffman code will be constructed with the aid of a binary tree structure. The overall outline is as follows:

Collect a count of each character/number, and sort into ascending order, in a random-access queue type structure (i.e., there are 5 T's, there are 12 H's, there are 16 E's)
Grab the first two (i.e., smallest count) elements from the queue. (T,H)
Remove the two elements from the queue (queue now contains E=16)
create a new binary tree, by creating a new parent node whose left child is the first element you grabbed from the queue (in this case, T), and whose right child is the second element you grabbed from the queue (in this case, H). Later, when we actually build the code, following the left branch will mean that we add a "0" to the end of the code, while following the right child will mean that we add a "1" to the end of the code.
Add the count of the two children, and assign that value to the parent (in this case, it will be 17), and insert the parent back into the queue in sorted order (the queue now contains {E=16,BinaryTreeOfTandH=17} )
If there's only one element in the queue, then we're done building the tree structure, and we can use it to figure out what the actual code will be. Otherwise, return to step 2, above
Once the queue contains a single tree, then we can build the actual code itself: do a traverse the entire tree, keeping a bit-string variable as you go. Initialize the variable to "", and if you follow a left branch, post-pend "0". If you follow a right branch, postpend "1". When you return to a node's parent, remove the last bit from the string. When you reach a leaf, print out the character stored at that leaf, along with the bit string (e.g., E = "0", T="10", H="11"). This generates an optimal Huffman code for the given collection of characters, which you can then use to compress documents,etc

Problem: Sort
One Dimensional
Input: a sequence of numbers {A1, ..., An }
Output: a sequence of numbers {A1`, ... , An`}, such that for 1<= i <= n, Ai <= A(i+1)
Notes: Sequence usually represented using an array

Bubble Sort: Incremental sorting algorithm
Space: In-place, or O(n)
Time: Theta( n2 )
(Using an array, sorting into ascending order) Start at the leftmost end of the array, and during each iteration move one slot towards the right. During each iteration, start another pointer at the rightmost end of the array, and move it left till the right pointer points to the same slot as the left pointer.

Bucket Sort: Incremental sorting algorithm
Space: In-place, or O(n)
Time: Theta( n )

Heap Sort: Incremental sorting algorithm
Space: In-place, or O(n)
Time: Theta( n*lg(n) )
(Using an array, sorting into ascending order) The basic idea is that we'll build a binary heap on the array (in linear time) then repeatedly extract the max element (each taking O(lg(n)) time), and placing that element at the end of the array. Since the removal of the max element reduces the size of the heap by one, each time we remove the max element we'll create a slot in the array that can be used to hold the sorted result.

HeapSort( Array A)
{
    BuildHeap( A );
    int HeapSize = A.length;
    for( i = A.length; i >= 2; i--)
    {
        Swap( A[1], A[i] );
        HeapSize--;
        Heapify(A, 1 );
    }
}

/* We'll take an arbitrary array, and make a heap out of it 'bottom-up' using Heapify.*/

BuildHeap( Array A)
{
    for( i = A.length / 2; i >= 1; i--)
       Heapify(A, i, A.length)
}

/* It is assumed that for the binary tree rooted at index root that it's two subtrees are heaps, but that the element at root may not satisfy the heap property.*/

Heapify( Array A, int root, int HeapSize)
{
    int left = 2*i;
    int right = 2*i+1;
    int largest;
    if( left <= HeapSize && A[left] > A[root] )
       largest = left;
    else
       largest = root;

    if( right <= HeapSize && A[right] > A[largest] )
       largest = right;
    if( root != largest )
    {
       Swap( A[root], A[largest] );
       Heapify( A, largest, HeapSize );
    }
}

Insertion Sort: Incremental sorting algorithm
Space: In-place, or O(n)
Time: Theta( n2 )
(Using an array) Start leftmost. During each iteration: take the value in the slot to the right of the sorted area, and sort it into
the sorted area by iterating over the sorted area, and inserting the value so as to maintain the nondecreasing
property of the sorted area. Don't forget to move all the array elements greater than the value one to the right

InsertionSort( Array )
{
    index = 0;
    while( index < Array.length )
    {
        index++;
        for( i = 0; i< index;i++)
        {
            if( Array[index] <= Array[i] )
            {
                MoveMemoryFromToCount( Array[i], Array[i+1], index-i );
                Array[i] = Array[index];
            }
        }
    }
}

Merge Sort: Divide-and-Conqueur sorting algorithm (recursive)
Space: In-place (tricky), or O(n) (might be better b/c of less memory manip). Note also that we don't need to have all of the arrays
Time: Theta( nLog2(n) )
(Using an array)If the array is one element long, return (it's sorted). Otherwise, divide the array into two halves. Recursively sort the left and right halves
using Merge Sort. Merge the two sorted halves together.

MergeSort( Array, start, end )
{
    if( start < end )
    {
        middle = floor( (start+end)/2 );
        MergeSort( Array, start, middle );
        MergeSort( Array, middle + 1, end );
        Merge( Array, start, middle, end );
    }
}

Merge( Array, start, middle, end )
{
    left = start;
    leftMax = middle;
    right = middle+1;
    rightMax=end;
    current;

    while( left != leftMax OR right != rightMax )
    {
        if( Array[left] < Array[right] )
        {
            current = Array[left];
            left++;
        }
        else
        {
            current = Array[right];
            right++;
        }
        //move the array over by one
        memmove( Array[left], Array[left+1],rightMax-left);
        Array[left] = current;
        //the array has been moved
        left++;
        right++;
    }
}

Selection Sort: Incremental sorting algorithm
Space: O(n)
Time: n² worst case, n best case
Given source array A (of length n), and a destintatinon array B (also of length n), iteratively find the smallest element of A, and put it into B. If one is able to allocate a boolean array of size equal to A, one can store "alreadyUsed" info there, thus handling both duplicates and not modifing the array. If one is allowed to modify the source array, one can use the swapping variant instead.

SelectionSort( Array A, Array B )
{
    int min = POSITIVE_INFINITY;
    int iMin;
    int lastMin = POSITIVE_INFINITY;
    int iLastMin = POSITIVE_INFINITY;

    for( int iB = 0; iB < B.length;iB++)
    {
        for( int iA = 0; iA < A.length;iA++ )
        {
            if( A[iA] == lastMin and iA > iLastMin )
            {    //if this is the next duplicate
                                min = A[iA];
                                iMin = iA;
                break; //to point A
            }
            if( min > A[iA] && A[iA] > lastMin )
            {
                min = A[iA];
                iMin = iA;
            }
        }
        //point A:
        lastMin = min;
        iLastMin = iMin;
        B[iB] = min;
    }
}
Variant: A can be modified
SelectionSort( Array A )
{
    int min = POSITIVE_INFINITY;
    int iMin;

    for( int iB = 0; iB < B.length;iB++)
    {
        for( int iA = iB; iA < A.length;iA++ )
        {
            if( min >= A[iA] )
            {
                min = A[iA];
                iMin = iA;
            }
        }
                swap( A[iB], A[iMin] ); //move element iMin to iB and vice-versa, within A
    }
}

Problem: Sort
MultiDimensional

Problem: General Tree Searching:
Input: A general tree, with branching factor of b, depth of m, and a value V to find within the tree
Output: The location of the data within the tree (or, the data stored with the value)

Breadth First Search (BFS): General Tree Searching Algorithm
Space: b^d (d = solution depth in tree)
Time: b^d (d = solution depth in tree)
Optimal?:Yes (finds value closest to top)
Complete?: (will always find a value, if present)
(noninfinite branching ) Yes
(infinite branching) No
Start by examining the topmost node in the tree. If it's not the one we want, examine all of it's children. If they aren't it, examine their children. Continue until
you find the value (or exhaust the tree)

Depth First Search (DFS): General Tree Searching Algorithm
Space: bm (m = solution depth in tree)
Time: b^m (m = solution depth in tree)
Optimal?: No (finds value closest to top)
Complete?: (will always find a value, if present)
(infinite depth) No
(finite depth) Yes
Start by examining the topmost node in the tree. If it's not the one we want, examine it's leftmost children and remember the rest. If that isn't it, examine the child's leftmost child. Continue until you find the value (or get to the bottom of the tree). If you hit the bottom of the tree, go back a level, and try the next to leftmost child. Continue until value is found, or tree is exhausted.
Variations:
Iterative Deepening:
Space: bd (d = solution depth in tree)
Time: b^d (d = solution depth in tree)
Optimal?: Yes (finds value closest to top)
Complete?: (will always find a value, if present)
(infinite branching) No
(finite branching) Yes
Do a DFS to depth i = 1. If value is found, we're done. If not, increment i by one and repeat. Continue until value is found or tree is exhausted.

Generic DFS/BFS Algorithm:
TreeSearch( Tree T, Value V )
{
    L = make-queue; //for BFS
    L = make-stack; //for DFS
   do
    {
        node = remote-front( L )
        if ( V = = node.key )
        {
           return node;
        }
        C = Children( node );
        add( C, L );
    }while ( L != empty )
    return "failed"
}

Abstract Data Type: Queue
FILO data structure. Supports addAtEnd, getFromFront, isEmpty, deleteAll
Can use Linked List, Array, or combo to implement

Abstract Data Type: Stack
FIFO data structure. Supports , addAtFront, getFromFront, isEmpty, deleteAll
Can use Linked List, Array, or combo to implement

Abstract Data Type: Dequeue (pronounced 'deck')
FIFO or FILO data structure, depending on methods invoked. Supports addAtEnd, addAtFront, getFromFront, getFromEnd, isEmpty, deleteAll
Can use Linked List, Array, or combo to implement

Abstract Data Type: Hash
Direct Addressing:
Use the key as an index value into a direct address table (array, usually). Assumes 1 slot per element
Hash Tables:
When the universe U is really large, direct addressing is impractical, especially if only a subset will be used. Store key k into slot h(k), where h is a hash function. We want h to be as random as possible, as this tends to reduce the number of collisions that happen. Simple Uniform Hashing functions put keys into slots with equal probability of putting any given key into any given slot.
m slots, n elements load factor of Alpha = n/m
Good Hash Functions:
Want something that produces apparantly random output. Realistically, look for a function that derrives values independent of any patterns in the data. (For example, and compiler's symbol table might often find pt and pts close together). The keys are assumed to be natural numbers, and most data can be kludged into this form. For example, an ASCII string could be interpreted as a sequence of 128 bit numbers.
Division Method: h(k) = k mod m
If m is a power of 2, this will take the lower order bits. Likewise, if m is a power of 10, and decimal numbers are used as keys, we don't use all the bits. Ideally, we want a function that depends on all the bits, not just the lower order ones.
If m = 2^p -1, and k is a character string interpreted in radix 2p, two strings that are identical except for the transposition of two characters will hash to the same value.
A good choice of m is a prime number that's NOT near a power of 2.
On the upside, it's easy to implement.
Multiplication: 0 < A < 1 h(k) = Floor( m * ( (k * A ) mod 1 ) )(ie, m * fraction resulting from k * A)
Value of m isn't critical. Typically a power of 2.
Good implementation: (assuming k fits into a word, words are w bits)
multiply k by floor( A * 2^w), which goes into r₁2^w + r₀
p bit hash value is the p most significant bits of r₀
Universal Hashing: pick a random hash function to use at startup
hash functions are choosen from a class of functions that generate numbers independent of keys)
A set H of hash functions is UNIVERSAL if for distinct x,y in U, the number of hash functions h in H for which h(x) = h(y) is sizeof( H ) / m

Collision Resolution Methods:
Chaining:
Each slot is a linked list, collisions are added to the linked list bucket.
Analysis:
Wost case: Theta( n ) (ie, same a linked list, when everything hashes to the same value)
Average case: Search = Theta( 1 + Alpha) (ie, hash function plus elements searched). Note that if the number of slots (m) is proportional to the number of elements (n), then Alpha = n/m = O(m)/m ~= 1, thus will be pretty much constant time

Open Addressing: Store collided elements elsewhere within the table.
INSERT: iteratively compute slots to try storing the element in, put the element in the first open slot.
SEARCH: iteratively compute slots to probe (to see if the element's there), stop when we either find the element or hit an empty slot
DELETE: SEARCH the hashtable for the element, when we find it, mark the slot deleted (not empty, in case something found this slot occupied when trying to INSERT)
Ideally, would like a Uniform Hashing algorithm such that actually generates a 'random' sequence of slots to install in.

Linear Probing: h(k,i) = ( h'(k) + i) ) mod m

Quadratic Probing: h(k,i) = ( h'(k) + c₁i₊c₂i²) mod m

₁

₂

Double Hashing: h( k, i ) = ( h'(k) + i h''(k) ) mod m

Analysis:
Unsuccessful Search = Theta( 1/(1-Alpha) ) //same as INSERT
Successful Search = Theta( (1/Alpha)(ln( 1/(1-Alpha)) )

Abstract Data Type: Binary Heap
Built on a Priority Queue structure. Supports addElement, getMax(or getMin), deleteMax (or deleteMin), deleteElement.
The term heap is actually a very overloaded word. It can refer to a block of memory from which dynamic memory is allocated, to a area of storage that is garbage-collected, or to the following data structure.
A binary tree is an almost complete binary tree that is stored in an array, with all levels except the last filled. The last level is filled from the left, so that once you reach an empty node, all other nodes to the right of that node are empty as well.
For any element at index i (except the root, which is located at index 1), the parent is located at index i/2 (rounding down). The left child of the element at index i is located at index i*2, and the right child is located at index i*2+1.
The heap property is that for any element (except the root), the value of that element is less than the value of it's parent. By maintaining this property, we ensure that the greatest element in the heap is always located at the top of the tree (at index 1). We can then use the heap as a priority queue, knowing that it will take only lg(n) time to adjust the structure, since adjusting the structure will involve modifying only a single path from the root to the bottom level of the tree.

Abstract Data Type: Priority Queue
Priority Queue structure. Supports addElement, getMax(or getMin), deleteMax (or deleteMin), deleteElement.