Algorithms And Data Structures |
09/23/98
|
Algorithms | Analysis | |
Compression | ||
Graph |
|
|
Search | ||
Sort (1 Dimension) |
|
|
Sort (Multi-Dimension) |
| |
Tree Searching | ||
Data Structures (Abstract Data Types) |
|
Binary Search Divide-and-Conquer Search
Space: O(1)
Time: O( log2(n) )
BinarySearch( Array A, int value, int start, int end)
{
//base case
if( start = = end )
{
if( A[start] = = value )
return start;
else
return VALUE_NOT_FOUND;
}
//inductive case
int middle = floor( (start+end) / 2 );
if( value >= A[middle] )
BinarySearch( A, value,
middle, end );
else
BinarySearch( A, value,
start, middle );
}
Huffman Encoding: Greedy Algorithm for an Optimal, Variable Length Code
Space:
Time:
When designing a binary character code, we can use either a
fixed length code, like ASCII, where every character requires the
same amount of space, or a variable length code, where different characters
require different amounts of space. We can take advantage of variable length codes
to assign frequently used characters shorter codes, and thus save space. If we
convert a document that uses a fixed-length code into one that uses a variable-length
code that has been optimized for this document, we can often compress the document
efficiently. The Huffman code is a prefix code, meaning that no codeword is a prefix
of some other code word. It can be proven that prefix codes are capable of achieving
optimal compression.
The Huffman code will be constructed with the aid of a binary tree structure. The
overall outline is as follows:
Bubble Sort: Incremental sorting
algorithm
Space: In-place, or O(n)
Time: Theta( n2 )
(Using an array, sorting into ascending order) Start at the leftmost
end of the array, and during each iteration move one slot towards the right.
During each iteration, start another pointer at the rightmost end of the array,
and move it left till the right pointer points to the same slot as the left
pointer.
Bucket Sort: Incremental sorting
algorithm
Space: In-place, or O(n)
Time: Theta( n )
Heap Sort: Incremental sorting
algorithm
Space: In-place, or O(n)
Time: Theta( n*lg(n) )
(Using an array, sorting into ascending order) The basic idea is that we'll build
a binary heap on the array (in linear time)
then repeatedly extract the max element (each taking O(lg(n)) time), and placing that
element at the end of the array. Since the removal of the max element reduces
the size of the heap by one, each time we remove the max element we'll create a slot
in the array that can be used to hold the sorted result.
HeapSort( Array A)
{
BuildHeap( A );
int HeapSize = A.length;
for( i = A.length; i >= 2; i--)
{
Swap( A[1], A[i] );
HeapSize--;
Heapify(A, 1 );
}
}
/* We'll take an arbitrary array, and
make a heap out of it 'bottom-up' using Heapify.*/
BuildHeap( Array A)
{
for( i = A.length / 2; i >= 1; i--)
Heapify(A, i, A.length)
}
/* It is assumed that for the binary tree rooted at
index root that it's two subtrees are heaps, but that the element
at root may not satisfy the heap property.*/
Heapify( Array A, int root, int HeapSize)
{
int left = 2*i;
int right = 2*i+1;
int largest;
if( left <= HeapSize && A[left] > A[root] )
largest = left;
else
largest = root;
if( right <= HeapSize && A[right] > A[largest] )
largest = right;
if( root != largest )
{
Swap( A[root], A[largest] );
Heapify( A, largest, HeapSize );
}
}
Insertion Sort: Incremental sorting
algorithm
Space: In-place, or O(n)
Time: Theta( n2 )
(Using an array) Start leftmost. During each iteration: take
the value in the slot to the right of the sorted area, and sort it into
the sorted area by iterating over the sorted area, and inserting the
value so as to maintain the nondecreasing
property of the sorted area. Don't forget to move all the array
elements greater than the value one to the right
InsertionSort( Array )
{
index = 0;
while( index < Array.length )
{
index++;
for( i = 0; i< index;i++)
{
if( Array[index] <= Array[i] )
{
MoveMemoryFromToCount( Array[i], Array[i+1], index-i );
Array[i] = Array[index];
}
}
}
}
Merge Sort: Divide-and-Conqueur
sorting algorithm (recursive)
Space: In-place (tricky), or O(n) (might be better b/c of less
memory manip). Note also that we don't need to have all of the arrays
Time: Theta( nLog2(n) )
(Using an array)If the array is one element long, return (it's sorted).
Otherwise, divide the array into two halves. Recursively sort the
left and right halves
using Merge Sort. Merge the two sorted halves together.
MergeSort( Array, start, end )
{
if( start < end )
{
middle = floor( (start+end)/2
);
MergeSort( Array, start,
middle );
MergeSort( Array, middle
+ 1, end );
Merge( Array, start,
middle, end );
}
}
Merge( Array, start, middle, end )
{
left = start;
leftMax = middle;
right = middle+1;
rightMax=end;
current;
while( left != leftMax OR right != rightMax )
{
if( Array[left] <
Array[right] )
{
current = Array[left];
left++;
}
else
{
current = Array[right];
right++;
}
//move the array over by one
memmove( Array[left],
Array[left+1],rightMax-left);
Array[left] = current;
//the array has been moved
left++;
right++;
}
}
Selection Sort: Incremental sorting
algorithm
Space: O(n)
Time: n2 worst case, n best case
Given source array A (of length n), and a destintatinon array B (also
of length n), iteratively find the smallest element of A, and put it into
B. If one is able to allocate a boolean array of size equal to A,
one can store "alreadyUsed" info there, thus handling both duplicates and
not modifing the array. If one is allowed to modify the source array, one
can use the swapping variant instead.
SelectionSort( Array A, Array B )
{
int min = POSITIVE_INFINITY;
int iMin;
int lastMin = POSITIVE_INFINITY;
int iLastMin = POSITIVE_INFINITY;
for( int iB = 0; iB < B.length;iB++)
{
for( int iA = 0; iA
< A.length;iA++ )
{
if( A[iA] == lastMin and iA > iLastMin )
{ //if this is the next duplicate
min = A[iA];
iMin = iA;
break; //to point A
}
if( min > A[iA] && A[iA] > lastMin )
{
min = A[iA];
iMin = iA;
}
}
//point
A:
lastMin = min;
iLastMin = iMin;
B[iB] = min;
}
}
Variant: A can be modified
SelectionSort( Array A )
{
int min = POSITIVE_INFINITY;
int iMin;
for( int iB = 0; iB < B.length;iB++)
{
for( int iA = iB; iA
< A.length;iA++ )
{
if( min >= A[iA] )
{
min = A[iA];
iMin = iA;
}
}
swap( A[iB], A[iMin] ); //move element iMin to iB
and vice-versa, within A
}
}
Breadth First Search (BFS): General
Tree Searching Algorithm
Space: bd (d = solution depth in tree)
Time: bd (d = solution depth in tree)
Optimal?:Yes (finds value closest to top)
Complete?: (will always find a value, if present)
(noninfinite branching ) Yes
(infinite branching) No
Start by examining the topmost node in the tree. If it's not
the one we want, examine all of it's children. If they aren't it,
examine their children. Continue until
you find the value (or exhaust the tree)
Depth First Search (DFS): General
Tree Searching Algorithm
Space: bm (m = solution depth in tree)
Time: bm (m = solution depth in tree)
Optimal?: No (finds value closest to top)
Complete?: (will always find a value, if present)
(infinite depth) No
(finite depth) Yes
Start by examining the topmost node in the tree. If it's not
the one we want, examine it's leftmost children and remember the rest.
If that isn't it, examine the child's leftmost child. Continue until
you find the value (or get to the bottom of the tree). If you hit
the bottom of the tree, go back a level, and try the next to leftmost child.
Continue until value is found, or tree is exhausted.
Variations:
Iterative Deepening:
Space: bd (d = solution depth in tree)
Time: bd (d = solution depth in tree)
Optimal?: Yes (finds value closest to top)
Complete?: (will always find a value, if present)
(infinite branching) No
(finite branching) Yes
Do a DFS to depth i = 1. If value is found, we're done.
If not, increment i by one and repeat. Continue until value is found or
tree is exhausted.
Generic DFS/BFS Algorithm:
TreeSearch( Tree T, Value V )
{
L = make-queue; //for
BFS
L = make-stack; //for
DFS
do
{
node = remote-front( L )
if ( V = = node.key )
{
return node;
}
C = Children( node );
add( C, L );
}while ( L != empty
)
return "failed"
}
Abstract Data Type: Hash
Direct Addressing:
Use the key as an index value into a direct address table (array, usually).
Assumes 1 slot per element
Hash Tables:
When the universe U is really large, direct addressing is impractical,
especially if only a subset will be used. Store key k into slot h(k),
where h is a hash function. We want h to be as random as possible,
as this tends to reduce the number of collisions that happen. Simple
Uniform Hashing functions put keys into slots with equal probability
of putting any given key into any given slot.
m slots, n elements load factor of Alpha = n/m
Good Hash Functions:
Want something that produces apparantly random output. Realistically,
look for a function that derrives values independent of any patterns in
the data. (For example, and compiler's symbol table might often find pt
and pts close together). The keys are assumed to be natural
numbers, and most data can be kludged into this form. For example,
an ASCII string could be interpreted as a sequence of 128 bit numbers.
Division Method: h(k) = k mod m
If m is a power of 2, this will take the lower order bits. Likewise,
if m is a power of 10, and decimal numbers are used as keys, we don't use
all the bits. Ideally, we want a function that depends on all the
bits, not just the lower order ones.
If m = 2p -1, and k is a character string interpreted in
radix 2p, two strings that are identical except for the transposition of
two characters will hash to the same value.
A good choice of m is a prime number that's NOT near a power of 2.
On the upside, it's easy to implement.
Multiplication: 0 < A < 1 h(k) = Floor( m *
( (k * A ) mod 1 ) ) (ie, m * fraction resulting
from k * A)
Value of m isn't critical. Typically a power of 2.
Good implementation: (assuming k fits into a word, words are w bits)
multiply k by floor( A * 2w ), which goes into r12w
+ r0
p bit hash value is the p most significant bits of r0
Universal Hashing: pick a random hash function to use at startup
hash functions are choosen from a class of functions that generate
numbers independent of keys)
A set H of hash functions is UNIVERSAL if for distinct x,y in
U, the number of hash functions h in H for which h(x) = h(y) is
sizeof( H ) / m
Collision Resolution Methods:
Chaining:
Each slot is a linked list, collisions are added to the linked list
bucket.
Analysis:
Wost case: Theta( n ) (ie, same a linked list, when everything hashes
to the same value)
Average case: Search = Theta( 1 + Alpha) (ie, hash function plus elements
searched). Note that if the number of slots (m) is proportional to
the number of elements (n), then Alpha = n/m = O(m)/m ~= 1, thus will be
pretty much constant time
Open Addressing: Store collided elements elsewhere within the
table.
INSERT: iteratively compute slots to try storing the element in, put
the element in the first open slot.
SEARCH: iteratively compute slots to probe (to see if the element's
there), stop when we either find the element or hit an empty slot
DELETE: SEARCH the hashtable for the element, when we find it, mark
the slot deleted (not empty, in case something found this
slot occupied when trying to INSERT)
Ideally, would like a Uniform Hashing algorithm such that actually
generates a 'random' sequence of slots to install in.