Loading...

Algorithms

al-gore.jpg fc11.jpg fc12.jpg
fc15.jpg fc16.jpg fc17.jpg


An Introduction

Unlike software and hardware, an algorithm is timeless. An algorithm is not tied to a particular platform, or programming language. Sometime, novice programmers will look at the high-level methods and routines supported by a language and use that to shape their programs, what Steve McConnell refers to as Programming in a language"; this should not fool you into believing that an algorithm is in any way, shape, or form, dependant on a programming language. Conversely, a seasoned programmer will design their algorithms independent of the language that they will program in, what Steve McConnell refers to as Programming into a language".

An algorithm is a thing, that is universal about a program (Skiena, 1997).

There are two sought-after properties in any algorithm, regardless of what system the algorithm has been designed for...

  1. Correctness
  2. Efficiency

Measuring Algorithm Efficiency

Random Access Machine (RAM)

Action Complexity Description
[+-*/=]|if|call|<memory-access> 1 Simple - A singleton step.
while|for|subroutines >1 Complex - These are compositions of many single-step operations.

Note that in the RAM model, all memory access is treated equally, regardless of the underlying memory used (for example, CPU L1, L2 cache, DIMMS, disc, etc). The reason is that they're all related by a constant, and not an exponential relationship.

As we do in real-life, the RAM model works by simplifying real-world complexities. We often make such abstractions to simplify complex tasks, for example, assuming that the earth is flat when designing a shopping centre is significantly easier than assuming that the world is spherical, and in the size-range boundaries of the problem domain, that works perfectly well.

Under the RAM model, we measure the time complexity of an algorithm, purely based on the number of steps we measure. We can later make an assumption of the number of steps that are executed per second and derive a reasonable estimate of actual time it would take to solve a particular problem.

The RAM model simplifies the complexities of the underlying machines and allows us to measure with reasonable accuracy, the efficiency of a given algorithm. Remember that an efficient algorithm on the slowest machine will always outperform the poorly designed algorithm on the fastest machine, given a suitably large-enough data set. Faster computers are faster by some constant amount, say 1,000 times, or 1,000,000 times faster, however an efficient algorithm is not subject to the same constraint, and could easily shatter the time/space complexity of a problem exponentially.

As einstein said, make all problems as simple as possible, but - no simpler - this is the essence of abstraction.

Asymptotic Analysis (of Worst-Case Complexity)

O(n) Upper Bound Upper Bound
Ω(n) Omega Notation Lower Bound
Θ(n) Theta Notation Tight Bound (Upper Bound AND Lower Bound)

Here's some explanation of these bounds...

Notation Translation Meaning
f(n) = O(g(n)) k⋅g(n) is an upper bound to f(n) There exists some constant k, for which k⋅g(n) is always ≥ f(n), for a large-enough n
f(n) = Ω(g(n)) k⋅g(n) is a lower bound to f(n) There exists some constant k, for which k⋅g(n) is always ≤ f(n), for a large-enough n
f(n) = Θ(g(n)) k⋅g(n) is a lower bound to f(n) There exists some constant k1, for which k1⋅g(n) is always ≤ f(n), and there exists some constant k2, for which k2⋅g(n) is always ≤ f(n), for a large-enough n. That means that g(n) is a nice tight bound on f(n)

In Big-Oh notation, it is sometimes helpful to read the "=", not as the usual mathematical meaning, but as one of the functions that are, for example, n2 is clearly one of the functions that are O(n3) (i.e., upper-bounded by n3), but certainly not the only one.

Dominance

$Given\; f\left( x \right)\; =\; n^{\alpha },\; g\left( x \right)\; =\; n^{\beta },\; f\left( x \right)\; dominates\; g\left( x \right)\; \forall\; x\; if\; \alpha \; >\; \beta$

Machine Learning Algorithms



Analysing an Algorithm

There are two ways to analyze the performance of algorithms...

  1. Benchmarking (empirical testing)
  2. Mathematical Analyses - i.e. the Big-O notation.

Divide & Conquer Algorithms

Such algorithms often have a complexity that is very naturally modelled by recurrent relations. Analysing the algorithm in such instances effectively reduces to solving the recurrent relation, so the first step is to model the algorithm by finding this relation.

Recurrent Relations (Definition)

These are simply equations, that are defined in terms of themselves.

Recursion & Mathematical Induction (Note)

An important thing to realise is this...

Recursion is Mathematical Induction

In both cases, you have a...

  1. Basis Case
  2. General Case

Note

Realize that linear, finite history, constant coefficient recurrences always can be solved (Skiena, 1997).

Proof-by-Induction

  1. Come up with a recurrent relation.
  2. Show that the basis is true using this relation.
  3. Assume for all numbers between, this formula holds, and back-substitute, and solve.

Step 3 should simplify back to the formula form step 1, and if so, we've made a proof-by-induction.

Master Theorem

If your recursive relation is of the following form...

T(n) = A⋅T(n/B) + f(n)

...where A, and B are constants, and f(n) is the additive term, then the Master Theorem can solve your problem.

Now based on the additive value, f(n), we will be dispatched to 1 of 3 possible cases.

Case 1/3

f(n) is small (small-enough)...

We will end up with this...

T(n) = A⋅T(n/B)

Here, the Master Theorem states that almost all of the contribution will be from the leaves of our recursion tree, i.e., "Count the leaves". Of course the number of leaves in a binary tree of size n is equal to 2height, which equals to n itself.

Case 2/3

f(n) is large (large-enough)...

Here, the additive term dominates, and we will end up with this...

T(n) = f(n)

Case 3/3

f(n) is just right...


Systems Algorithms

See  this page.