Saturday, July 30, 2011

Antipatterns and the Limits of Language

I recently read Bill Karwin's SQL Antipatterns, which details common design practices that are usually bad ideas. I found Karwin's book interesting from a 'scientific' viewpoint, as it provides a taxonomy of common design mistakes. For example, if you see someone design a column that contains multiple comma-separated values, you can accuse them of 'Jaywalking.' If you see a table that contains key-value pairs in an attempt to add some flexibility to the relatively static nature of SQL metadata, you can yell out 'Entity-Attribute-Value'!

These are not rules set in stone, and Karwin is careful to detail the situations in which each antipattern might have a use. It is worth mentioning here that many of the arguments you see about database design stem from the belief that a one-size-fits-all approach is possible. In the end, however, it is typically business requirements that determine database design, and this can often lead to circumstances that require a less-than-ideal approach from an architectural perspective. What was most interesting for me, however, is the kind of the explanation Karwin provides for why you should not use an antipattern. This always comes in the form of something like: SQL wasn't designed to do that. Why is it a bad idea to use self-referencing tables to represent a tree-structure? Because SQL wasn't designed to do that! You'd have to manage all the updates and deletes yourself, and there's no easy way to traverse the tree--at least, if you want to be ANSI compliant.

On the one hand, this kind of answer makes sense. Why should you normalize tables? Because then you benefit from the functionality SQL was designed to implement--namely referential integrity and non-redundancy. Why is it a bad idea to 'clean up' gaps in auto-generated primary keys? Because the value of a key is only important insofar as it is a uniquefier, not as an actual value. In the end, SQL was designed to solve certain problems and not others. If you have a problem that requires you to use SQL in ways that twist your database's architecture into strange contortions, you'll end up with wasted effort and sub-optimal performance.

On the other hand, this kind of answer is disappointing. If SQL can't solve my problem, what will? If my metadata is constantly changing, what should I use instead of SQL? If my data cannot be represented by relations, how could it be effectively structured? Karwin does not really provide these answers. He does tell us what SQL is good for by telling us what we it is not good for, but this is not the same thing as telling us what we should use instead of SQL. Of course, Karwin cannot be blamed for this. These questions about the limits of a language cannot be answered from within the language itself. When we are actually working in a language, we know when we are at its limits: things become very complicated and require a lot of work. You could do statistical regressions in SQL, but it might take an infinite amount of time! It's really only with experience that we understand which languages--and which paradigms--are useful for which problems.

For all of these reasons, a large part of this blog will be dedicated to my exploration of various languages, paradigms, and their limits. Though this is a seemingly-abstract question, it is one that gets at the heart of design and the ways in which we solve problems.

A few links:
-An excerpt from Karwin's book
-Louis David's Ten Common Database Design Mistakes

3 comments:

  1. I kept waiting for the Wittgenstein quote! Excited to see where the blog goes...

    ReplyDelete
  2. Ha, isn't that your job? Just seemed to me like a Wittgenstein-sort of project. The limits of language games and their relations to forms of life and their functions...

    ReplyDelete

Related Posts Plugin for WordPress, Blogger...