Saturday, June 22, 2013

Logarithm Confessional

The truth is, I never really understood logarithms.  Like many students, I was just going through the motions--memorizing the rules but never having them really click.  Whenever I heard about people using logarithms and slide rules to multiply numbers in the days Before Calculators (BC), I was confused.  In college, I understood that algorithms with logarithmic growth or superlinear growth were better than quadratic ones, but I didn't have a good intuition about exactly how much better they were.

Does it matter?  I've come to think that it's essential for developers to have a good sense of logarithms and the other functions that describe algorithms.  Even if you're not writing many algorithms from scratch, it's important to know what's going on underneath the hood when you search through a database or use a hash table.  Even as computing power has become cheaper and cheaper, you can't always throw hardware at a problem.

I think an example helps.  Let's say you want to find a name in a phone book, and let's say there are 1,000,000 names in the phone book.  What if the names in the phone book were in random order?  In the worst case, you might have to look at all 1,000,000 names.  If it takes 1 second to look at 10 names, this could take around 28 hours.  In the best case, you only have to look at 1 name, but on average you'll have to look through 500,000.

Luckily in 1604 Robert Cawdrey invented alphabetical ordering.  If the phone book is in alphabetical order, how long would it take to find any name?  The answer is 2 seconds, if you flip to the middle of the book and keep dividing it in half until you find the name (and if you can still look at 10 in 1 second).  The first name you look at lets you ignore 500,000 other names.  The second name lets you cut out another 250,000.  Since we divide the number of names we're considering is cut in half each time, this is a logarithmic algorithm, and it will take 20 steps at most to find any name.  This can be shown by the fact that 220 = 1048576 or that log21048576 = 20.

What's amazing is that it hardly takes any more time to find a name as the phone book gets bigger and bigger.  It would take 3 seconds to look through 1,000,000,000 names and 4 seconds to look through 1,000,000,000,000.  Logarithms are the inverse of exponents.  Everyone knows that exponential growth is really fast growth.  Logarithmic growth is really slow growth.  In fact, it's almost constant.

This example shows why it's important to try to find algorithms that take a logarithmic amount of time to complete as data size increases.  The table below shows common algorithmic patterns.  For example, trying to compare all possible moves in a chess game could take longer than the history of the universe, since the number of boards increases exponentially.  It's hugely important to find an nlogn approach over an n2 one.  How huge?  32 years if you're working with 1,000,000,000 data points.

Commit it to memory!  And read Steve Skiena's book

If you are like I once was, don't curse your pathetic existence.  Humans are simply not good at understanding exponential or logarithmic growth--just like we're not good at memorizing random sequences of numbers though we excel and remembering spatial information (mental athletes typically turn memorization problems into spatial ones).  It makes no sense that $1.00 compounded at 10% annual growth for 100 years somehow becomes $13,780.61.  We can work hard to develop an intuition for such things, but it's easy to forget.  What helps me is revisiting tables like these from time to time.

2 comments:

  1. Log base 2 is one of the only ones I have a good intuition for, thanks to binary. 2^10 being approximately a million is a great rule of thumb. :)

    ReplyDelete
  2. It's also pretty interesting that it doesn't matter too much what base you use, i.e. how wide your tree is. Log base 10 of a million is 6. Ln is around 14.

    ReplyDelete

Related Posts Plugin for WordPress, Blogger...