Sunday, March 25, 2012

Is Statistics the Key to the Soul?

I've been brushing up on my statistics, since I was convinced I didn't learn anything practical in my two semesters of calculus-based 'statistics.' It turns out that I did learn a thing or two besides how to integrate Poisson distributions. What struck me most about statistics this time around is its objective power. I'm coming to believe that the history of human progress is the history of increasing abstraction--from markets, which abstract price from use value; to language, which substitutes abstract signs for the world's infinitude; to representative governments, which generate the will of the people from voter preferences; to information theory, which abstracts universally-understood 0's and 1's from meaning. Each new power of abstraction provides a new tool for humans to shape the world.

Borges' library has been found!
Of course, abstraction has its price. There's always something lost in the process of abstraction. Jorge Luis Borges (the author of the story for which this blog is named) writes at length about the experience of hitting the limits of abstraction, the real world. For example, if all knowledge was written down, we could never know anything, since we'd spend all our time sifting through an infinite number of books contained in an infinitely-forking library. If we remembered every experience we had in its minute detail, we'd never be able to live in the present or learn from the past. Similarly, the will of the minority loses out to the will of the majority, and statistics can never replicate lived experience.

Processes of abstraction can also be fetishized for their own sake. The operations of markets, the intricacies of language, the back-and-forth of the political process, the elegance of algorithms, or the endless march of statistical analyses are all deep, deep rabbit holes from which many never return. There's a point at which every student of philosophy--having been convinced by philosopher after philosopher--decides that it impossible to determine who is right and resolves, if only for a short time, to study philosophy solely for the beauty of its systems. There's even a perverse pleasure in the counter-intuitive nature of abstract thought, such as learning that rent control makes rent higher or that work = 0 when something is moved a great distance before returning to its origin.

One of the most interesting characteristics of processes of abstraction are the ambiguities inherent in them. Since abstractions miss something real, they are always equivocations. This happens in language when we can't decide what to call something. Is Pluto a planet or an asteroid? In statistics, ambiguity appears in the form of studies that contradict each other. Are eggs good for you or not, for chrissake? Statistical analyses are objective and help us overcome the biases of our thought processes, such as when they show us the irrationality of our fear of flying, sharks, and home invasion. But it's easy to slice the world into irreconcilable parts when those parts are so small in comparison to the actual, ever-changing world. Scientists often can't reproduce the results of their experiments. This means that either 1) the laws or regularities of the world are not the same now as they were at the time of the experiment or 2) there is some variable which they have not accounted for. And there are always variables that are not accounted for.

With the growing trend of personal data collection, from activity tracking to sleep monitoring to mental acuity quantification, we will soon be able to analyze ourselves with ever greater scrutiny. If you want to know exactly how far you've run in the last five years, you can do that. Does this make you a better runner? That is not clear, but the trend is. We'll soon be able to use the data mining techniques that advertisers like Google created on ourselves. In some ways, this is exciting. What better way to know thyself than with objective data? Conquering ourselves may be the next frontier of the powers of abstraction. But this path will be fraught with even greater dangers of experience lost, fetishization, and ambiguity.


  1. A statistical approach that I've found very useful and fascinating, which you might enjoy reading about (if you haven't already encountered it) is something called the "Maximum Entropy Principle", which involves maximizing Shannon's informational entropy H=-K*sum_m(p_m*ln(p_m)) subject to constraints in order to find the functional form for the probabilities p_m that should be assigned to each microstate m of a particular system. The richness comes about through the choice of constraints -- in addition to normalization of the probability distribution to unity, for thermodynamic calculations one can introduce say that = sum_m(p_m*E_m) is known experimentally (average system energy) and the system is closed -- out comes the functional form for p_m of a canonical ensemble in statistical mechanics. You can add in that the average number of particles is known and that the system is open -- out comes the p_m for a grand canonical ensemble. In other words, by writing what we "know" and maximizing our uncertainty in what we don't know (i.e., introducing no additional bias or implied information), one can recover a lot of important results from a variety of fields.

  2. What if, as you measure yourself on, as you suggest, running, that you find that your times & performance are declining as result of age? You might try to work harder, but you also might get depressed and give up, so as in quantum physics the act of measuring actually affects the out come.

  3. Jon, Guess I still have a few things to learn about statistics! I have a passing familiarity with Shannon's work and the relation between entropy and information, but I'm having trouble connecting this to Bayes.

    BDW, I like the connection to Heisenberg--is data destiny? But I'd also add that, just because your times are declining as you age, there are many other variables you'd have to rule out in order to show that it's _because_ of age. I think this gets to Jon's point that all we can know are probabilities of correlation, and it all depends on how we choose constraints.



Related Posts Plugin for WordPress, Blogger...