the Garden of Forking Paths: December 2011

Sunday, December 18, 2011

Technical Books I Read This Year

Finally, the moment you've all been waiting for: a complete and centralized list of all the technical books I've read this year. And just in time for those doing last minute shopping for me. I didn't rank them, since whether or not a book is good depends upon what you want from it. Troelson's C# books give you a lot of breadth, but not a lot of depth. Skeet's go into great detail about the problems different components of C# are meant to solve, but sometimes you just need some examples to get going.

This list doesn't include anything I read to pass Microsoft exams (which was mostly just the 'training kits' Microsoft publishes anyway). It also doesn't include all the other books I read, from Middlemarch to Proficient Motorcycling. I'm sure these had some impact on my thinking about software development, but I had to stop somewhere.

High-Level:
-Passionate Programmer, Chad Fowler
-Pragmatic Programmer, Andy Hunt and David Thomas
-Seven Languages in Seven Weeks, Bruce Tate

Database:
-Database Refactoring, Scott Ambler and Pramod Sadalage
-SQL Antipatterns, Bill Karwin
-Mastering SQL Server Profiler, Brad McGehee
-Brad's Sure Guide to SQL Server Maintenance Plans, Brad McGehee
-How to Become an Exceptional DBA, Brad McGehee
-SQL Server Team-Based Development, Mladen Prajdic, Grant Fritchey, Alex Kuznetsov, and Phil Factor
-SQL Server Statistics, Holger Schmeling
-SQL Tuning, Dan Tow

C#:
-C# In Depth, Jon Skeet
-Pro C#, Andrew Troelson
-Silverlight 4 in C#, Robert Lair

Somewhat Random:
-At Home, Bill Bryson
-Readings in the Philosophy of Technology, David Kaplan
-Godel's Proof, Ernest Nagel and James Newman
-Learned Optimism, Martin Seligman

Books for 2012:
-The Mythical Man-Month, Fred Brooks
-The Art of SQL, Stéphane Faroult
-Kimball's Data Warehouse books
-The Visual Representation of Quantitative Data, Edward Tufte

And, of course, the training kits for the exams on WCF, ADO.NET, SQL BI, and perhaps something for SQL 2012.

Read any books worth sharing this year?

LINQ and Functional Object-Oriented Programming

It took me a while to come around to the "functional turn" in object-oriented programming. This includes the growth of Clojure and Haskell, as well as Microsoft's new LINQ and F# languages. I've been familiar with functional languages like LISP for some time, but it was hard for me to see how such an austere and limited paradigm could be fruitfully imported into the world of polymorphic objects. Furthermore, as someone well familiar with the dangers of bad SQL programming, the last thing I wanted was to allow a Zelda character to write my queries for me.

I haven't changed my mind about using LINQ to access a database, because database performance is too important to me. However, I have found LINQ to be extremely useful for shredding and joining data into custom data structures made up of generic data types. I do this to pass information between layers of abstraction, such as through WCF, or between classes. For example, you could turn a data table into a list of dictionaries, which could be serialized easily into a custom contract. You could then shred that returned structure into whatever you want, such as key value pairs in an IEnumerable of an anonymous type defining points in a chart.

Many of the recent changes to the C# language have been to make it more functional. C# 4.0 is mostly about dynamic typing, but 3.5 is all about LINQ--or rather, the typing and syntactical changes that make LINQ possible. If you've been away from C# for a few years, the language might look almost completely alien to you due to the introduction of Lambda expressions, expression trees, extension methods, implicit typing, nullable types, anonymous types, and anonymous functions. These features let you do pretty much anything you want with data on the fly.

var Points =
   from DataRow dr in dt.Rows
   where (String)dr["Program"].Trim() == Program
   select new
   {
        Date = (DateTime)dr["Date"],
        Value = Convert.ToDouble(dr[Metric] ?? 0.0)
   };

Though applications like this are where I use LINQ primarily, I can see its attraction for database querying. One of the ugliest and most difficult-to-debug pieces of OO code is SQL statements defined as string literals in application code. It's particularly ugly in VB.NET, since you have to include ampersands and end-of-line characters. The solution has always been to used stored procedures for all data access, but this best practice is not always practiced. LINQ provides you with strong typing, which is nothing to sneeze at. One of the biggest problems with SQL is its lack of type checking, especially when creating dynamic SQL. You could create some really cool data access classes that use LINQ instead of dynamic SQL.

Whether or not you use C#'s LINQ or Java's Clojure, the functional turn in object-oriented programming is interesting for its own sake. On the one hand, languages are becoming more and more flexible and intuitive. They allow you to do a lot with a few lines, and in an intuitive manner that abstracts away details. On the other hand, languages are becoming more complicated than ever. In reading Jon Skeet's C# in Depth, I kept thinking how much more information it provided than I needed to get the job done. I'll be very interested to see how further hybridizations of language provide ever more useful tools, especially for manipulating data.

Sunday, December 11, 2011

Technology and Collective Problem-Solving

"Technology" signifies all the intelligent techniques by which the energies of nature and man are directed and used in satisfaction of human needs; it cannot be limited to a few outer and comparatively mechanical forms.
--John Dewey

In a previous post, I explained how many philosophers, including Heidegger and Marcuse, see a rift between ethical reflection and technology. They worry that the means-ends thinking at the heart of technology can cause us to ignore other kinds of reflection--especially about who we want to be, what we hold to be just, and how we can lead more meaningful lives.

There is obviously a difference between painting a picture and developing a manufacturing plant to make paints and brushes, but what's wrong with solving problems? Is it really so dangerous as philosophers--who aren't typically known for being technologists--seem to think? John Dewey says no, arguing that all inquiry has a technological component insofar as it is meant to solve problems. If moral inquiry helps us solve problems, it's as technological as lasers and airplanes are. Theories are just tools for solving problems.

Understanding technology as problem-solving may seem impossibly vague, but it's actually very powerful. Whenever considering a new gadget, theory, or way of doing things, Dewey suggests we ask: what is the problem this is meant to solve? Remarkably, many new products don't seem aimed at solving any problems, or at least not any serious ones.

What about the problem of collective decision-making? Humans have created two lasting technologies for this purpose: representative governments and markets. Governments are good at ensuring certain behaviors that its people think should be ensured. They define and enforce justice, including the means of determining what justice is. This wasn't always the case and took many years of trial and error. Life used to be filled with a lot more anxiety, because the world was so much more unpredictable, and the means of determining fairness were uncertain.

Althingi, where Icelanders have solved problems since 930 CE
Governments, however, can only solve certain problems. They're bad at picking market winners, for example, and they're slow to react to change. They are good at prohibiting certain behaviors, but it's hard for them to make citizens moral, healthy, intelligent, or cultured. As Cass Sunstein argues in his book Nudge, the best governments may be able to do is incentivize certain behaviors so that people will make the right choices on their own.

Markets, on the other hand, provide a highly responsive way of determining what people value and what should be produced. As Friedrich Hayek recognized, markets aggregate people's individual choices and values and thus collectivize intelligence in a very efficient manner. Markets will always have the input of more people than governments as well as higher levels of participation. And, since people often know what they want better than 'experts,' markets can be more rational than governments.

Unfortunately, many things cannot be quantified in dollar values, such as the environment, health, or justice. We can adjust markets so that they take hidden costs into account, as cap-and-trade systems do, but these work best when you have a metric that can be easily tied to cost. Another criticism of markets is that people do not always act rationally, as Daniel Kahneman and other behavioral economists have shown. Even if we know what we want, we can't be sure to act accordingly.

Given the limitations of governments and markets, Deweyans turn to small groups for salvation. There are many interesting examples of small-scale collective problems solving, such as the rebirth of Pittsburgh or river management in Mexico, but it's hard to see how such solutions will scale. As our interactions become ever more global, we need globalized methods of collective decision making.

For these reasons, Clay Shirky and other technologists point to the internet as a possible third way of making intelligent choices collectively. It's not enough to say that the internet connects people. The idea of the internet as a 'Global Village' has become a joke, as new technologies help us filter each other out like never before. What Shirky points to is the way the internet lowers barriers to participation. Shirky's poster child is Wikipedia, which, like most internet phenomena, displays a long tail of participation. Many people work together, though the vast majority only contribute a little.

Lowering barriers is great, but it is probably not enough if we are to find a third way to compete with governments and markets. Can new technologies help us better solve collective problems? The question becomes ever more pressing as big players like Google, Microsoft, and Facebook become ever bigger and structure the ways we interact more and more. Not being evil is not the same thing as providing venues for increasing collective intelligence. What other problems should we be trying to solve?

Sunday, December 4, 2011

Branching and Merging

Nothing endures but change.
--Heraclitus

Heraclitus is usually remembered for saying that you can't step into the same stream twice, because the only thing real is change, not rivers or even mountains. Since most code is in about as much flux as the mighty Mississippi, I've often preferred the metaphor of streams to that of branches when thinking about version control. But really there's no perfect metaphor. What best describes a constantly growing and shrinking, branching and merging code base with information going in two directions and which can be frozen at any point?

I've always thought version control was one of the more interesting aspects of software development. Many people don't have a very good grasp of version control. It's another one of those key things they don't teach in school. But, truth be told, there's always something more to learn. Recently I discovered that there are as many ways to organize the branching structure of a repository as there are repositories. I had become accustomed to tying branches and tags (or streams and snapshots) to releases. A great MSDN article describes some of the more popular models.

First, of course, is the branch per release model. It's the model most used by software companies that have to support older versions of code while developing new ones. If you have some customers on 9.3 and some on 2.4, you have to be able to recreate older environments for debugging purposes. You can try to force all your customers to upgrade at the same time, but, given the different testing requirements they all might have, that's pretty unrealistic.

Most interesting of the new models I learned about was the branch per environment model. This makes a lot of sense if you're supporting in-house application code. It allows you to track what's going on in your different environments at any time, and it helps to keep your main (development) stream uncluttered. It can still get a bit messy when you have a lot of different development projects going on at the same time and with uncertain release dates. If your development is not fairly linear, this model may not be easy to implement.

To deal with multiple asynchronous development projects, you could do a branch per task. It's interesting to note that in this model branches have short lifespans, unlike in the previous two models, where they live in perpetuity. If 1.3 happens to go out before 1.2, you just have to merge 1.3 into the trunk and update the 1.2 code. I could see this being used in a relatively simple in-house shop, but definitely not at a software company.

Another option is having a branch per component. In this case, your core architecture is stable and your components do not bleed together. This might work if you have a waterfall-type SDLC in its initial stages and you want to isolate various web services, for example. If you want to avoid integration hell, however, this might not be the way to go.

Finally, you might have a branch per technology if you support multiple platforms. You would have your core code, which you could merge into all your different phone, gaming, or OS platforms. How well this strategy would work probably depends on the amount of overlap possible in the code of the different platforms.

There is no one best way to organize your repository. These models are just that--models. A combination of two or three might work the best for you. For example, you might have a branch per feature in your development stream, but then have fairly simple streams for your other environments. You could have branches for each technology, and then branches for each component within those branches. The possibilities are endless, but an overly complex system is probably more trouble than it's worth. Too few, and you'll have a difficult time developing. Too many and you'll spend all your time integrating. Find a happy medium between and a palm tree and a hedge labyrinth.

Links:
-The MSDN article