the Garden of Forking Paths: Abstraction

Showing posts with label Abstraction. Show all posts

Sunday, June 16, 2013

The Clouds

I am walking in the air, and speculating about the sun.
-- Socrates, in Aristophanes' The Clouds

Self-explanatory.

What's wrong with having your head in the clouds? For starters, you might fall into a well. That's what happened to the ancient Greek philosopher Thales while he was gazing at the heavens. Aristophanes ridiculed Socrates in a satire called The Clouds for speculating about abstract nonsense without understanding anything about the here-and-now.

Slightly more recently, 'The Cloud' has arrived to save us from all earthly ills. Got a problem scaling? Move to the cloud. Want to reduce cost? Go to the cloud. Disaster recovery? Cloud. High availability? Cloud. Yet, the cloud can't really be a cure-all. It may be appropriate for your needs, but this has to be evaluated given an understanding of how your cloud implementation will work. If your number one problem is performance or security, the cloud is probably not going to help.

There are many 'clouds' or high-level cure-alls available to developers today. For example, I've been learning the Spring Framework, which is a Dependency Injection (DI) framework. DI is a form of Inversion of Control (IoC), in which object coupling is not known at compile-time but rather is determined at compile-time. Using a DI framework, you can build interfaces, inherit classes from them, and control application flow using configuration files and annotations (like @inject). During development, you can stick to the pure work of thought--programming, that is--without having to worry so much about the nuts-and-bolts of connecting classes.

DI frameworks like Spring are just one more layer of abstraction (or one more cloud) that let us focus on business logic rather than implementation details. And, just like other layers of abstraction, they're a double-edged sword. Dependency injection isn't always a great idea, such as if your interfaces are likely to change often. Dependency information has to be stored somewhere, after all. Enterprise frameworks tend to give you flexibility for the price of code bloat and complexity. In my time, I've seen programming become ever more complex, but, at the same time, ever more like plumbing. Frameworks like Spring, Rails, and .Net's Entity Framework let you quickly build applications. But they don't prevent you from doing very stupid things if you don't know how they really work. ORM libraries are great, for example, until you have to scale.

For this reason, I've been brushing up on my data structures and algorithms. It's easy to think such things don't matter. Plenty of people get by without earning computer science degrees. And who cares if a client-side algorithm is O(N^2) when the database is orders of magnitude slower? You might concede that algorithmic theory is worth studying only to develop certain intuitions about processing time, memory size, problem size, and algorithmic complexity, which is certainly true. But I think it's useful to revisit data structures and algorithms once in a while to stay grounded in reality. If you're only doing high-level plumbing, you have your head in the clouds.

Sunday, January 20, 2013

How the Internet Works

I never really had a proper introduction to networking until taking an online course. I recommend the course, even for people with little experience in technology. It's really not rocket science. A network engineer, and former plumber, I know claims it's exactly like plumbing--"It's all pipes," he says.

Network engineers may be a bit like plumbers, but the architects of the Internet were cultivators of complex systems. They needed to find a balance between being too rigid and thus hampering future innovation on the one hand, and being too open and thus causing Tower-of-Babel-like chaos. These architects used at least two key conceptual tools to manage complexity: encapsulation and layering. Software developers should be familiar with both terms. For example, object-oriented programmers use encapsulation to abstract away the details of application code from their use, and they make use of layering to decouple functional domains.

This is all a fancy way of saying that the development of complex systems requires black boxes. It's impossible for any one person to understand everything that is going on in a reasonably complex system, so we must be able to say, "Well, we're not going to worry about that right now." Furthermore, if a change in one place requires a change in ten other places, you have a very fragile system.

The Internet has four layers. In the highest layer, applications talk to each other without worrying about any of the messy details. For example, your browser just requests a web page--it doesn't care how it gets there. The transport protocol (probably TCP) establishes connections and ensures that data is reliably delivered in sequence. It also manages congestion. TCP doesn't care how the data segments are transmitted. This is the job of the network layer, most likely IP. IP is a simple delivery service that does its best to get a message from one location to another. Routers direct traffic to the appropriate address based on routing tables, but there is no overall plan for the path packets should take. Finally, the link or physical layer is composed of the actual cables or wireless hardware. It relays bits, but it has no idea what the bits mean.

What's amazing about the Internet is that it hasn't changed much in forty years. We've never had to start over from scratch. We didn't have to stand by while the Masters of the Internet upgraded to version 2.0. In fact, it works so well that we usually don't have to think about it. Changes to the Internet, such as ARPAnet's upgrade to TCP Tahoe in 1986, happened pretty seamlessly. The current switch to IPv6 is transparent.

The main reason for the Internet's resiliency is its use of layering and encapsulation. Because each layer functions independently, its implementation is encapsulated or hidden from the layers above. For example, there has been a proposal to implement IP via carrier pigeons. The application and transport layer would still work the same, just a bit slower.

Layering and encapsulation are made possible by using a message-in-a-message technique, much like Russian stacking dolls. Network traffic is comprised of messages. TCP calls them segments, IP calls them datagrams, and Ethernet calls them frames, but they're all composed of a payload and a message header containing instructions for a particular protocol. The payload may contain another message consisting of another header and payload, and so on, down to the physical layer.

For example, when you go to a website, you use the HTTP protocol (application layer). Each HTTP response has a header and a message. A response header returns a GET header with the web page as the message. This response is broken up into the payloads of messages with TCP headers (transport layer). These message headers have instructions ensuring safe delivery. The TCP messages are the payloads of yet other messages, having headers with IP destination instructions. Finally, these messages are the payload of yet more messages having headers containing ports and MAC addresses.

The message-in-a-message technique is simple yet powerful. For example, VPN works by recursively layering messages. You might have an HTTP request in a TCP segment in an IP packet in a secured TLS presentation message in a TCP segment in an IP packet in an Ethernet frame. Requests over VPN are encapsulated within normal Internet traffic.

IPv6 works in a similar manner. We've already started to run out of all 4.29 billion IPv4 addresses (the familiar 192.168.1.1) and we have begun using IPv6's 3.4×10³⁸ addresses (the not-yet-familiar 2a91:0db8:85a3:0000:0000:8a2e:0370:7334). We don't need two Internets to support both network protocols thanks to a technique called tunneling. Network Address Translators can transmit IPv6 datagrams over IPv4 routers by simply putting IPv6 datagrams inside IPv4 ones. That is, IPv6 tunnels through existing IPv4 networks.

We don't know what the Internet will look like 100 years from now, but it will probably have new layers stacked on top of or recursively contained within existing layers. It will evolve piecemeal, without major upgrades causing blackouts. It may swap out TCP for some other, more efficient communications protocol, or it may replace the physical layer with faster wireless relays. Because of layering and encapsulation, changes to any one functional layer will have no impact on any others.

Friday, August 24, 2012

Who Thinks Abstractly?

I think my word for the year is 'abstraction.' Where would we be without it? Every major development in programming languages, like almost every major development in the world, involves a new layer of abstraction. For instance, assembly provides a layer of abstraction so you don't have to program in 0's and 1's. (Aren't you glad?) OO programming's tenet of encapsulating behavior and data allows you to abstract away the implementation. N-Tier application development lets you abstract away different layers of code. Web services allow you to abstract away domain languages.

The 0's and 1's are still there--somewhere. And maybe your system would run faster if you optimized the byte code. But it's probably not worth the effort. For a performance trade-off, you get a huge productivity boost. And programming becomes a whole lot more fun. It becomes more about concepts and systems and less about abstruse technical matters. I can't even think of the last time I had to worry about memory allocation.

It's easy to chart the rise of abstraction against the popularity of UML, a system of diagrammatic standards for modelling code (which is itself a model). UML has become necessary in order to deal with the interactions of our ever-more-complex systems.

Phillipe Krutchen's 4+1 view

I've been shoring up my understanding of UML, and one thing I never really thought about was how it provides a set of windows onto systems. UML lets you see the use cases, logic, processes, development, and physical makeup of a system. Though certain models have become very popular (such as sequence diagrams for modelling messaging systems), which particular one you need depends on the problem you're trying to solve.

You can look at a system from the perspective of:

A user (use case diagrams)
A business analyst (activity diagrams)
A developer (class, sequence, communication, and timing diagrams)
An architect (component, package, and state machine diagrams)
A systems architect (deployment diagrams)

UML is nice because it standardizes the napkin drawings we tend to do naturally. It gives us a common vocabulary we can all utilize. And it's not hard to learn.

Hot or not?

But I think developers worry that UML--or something like it--could fulfil the dream of managers everywhere: removing programmers from the picture entirely. Programming has changed drastically since I first started coding, and it's going to change a lot more before I kick the bucket. It's become much more about the interaction between technologies (stacks, libraries, and third party tools) and systems (services, ETL, and domains).

I don't think the need for systems thinkers is going anywhere anytime soon. We'll always need people who can grasp the vocabulary and design of systems and who can chart the ways they should interact and change. But what's really important is understanding the points where systems break down.

In the early 1800's the philosopher GWF Hegel posed the question, Who Thinks Abstractly? His answer: commoners, not lords and ladies. Lowly folk accept concepts at face value. In the case of a murder trial, for example, all they care about is the fact that someone has been deemed a murderer. It doesn't matter that mommy never loved them or that a bad situation got of hand--they must be hanged.

But particularities do matter, as lords and ladies, who were afraid to label anyone a murder, understood. People judged murders are more than just murderers. That's why we have so many categories of murder. Unfortunately, while lords and ladies understood particularities, they weren't good at seeing forests for trees.

Hegel's dialectic: a model for thinking that
undermines the possibility of models

Hegel's main idea is that we need to balance the use of systems of abstractions with an attention to messy reality.

Technology solutions are abstractions just like the concept of murder. And like all abstractions, they don't always do the work they were designed to do. Similarly, business rules can become too rigid to capture the ways business really happens. Systems (whether conceptual or technological) are powerful because they provide a vocabulary that lets us abstract away and control the complexities of reality. But those complexities remain.

Almost anyone who has developed software has faced the tradeoff between creating an elegant system and automating a process that has more exceptions than can be counted. This is why translation is so difficult--language as it is spoken and written always outstrips grammatical rules and dictionary definitions. This struggle between rules and reality will never end.

IQ has increased over time, and jobs have become more and more about systems. If you asked someone what a market was a hundred years ago, they'd point to a physical market rather than describe a system of exchange. Everyone needs to understand abstract concepts and systems today, but we also need people who can understand the particular situations where those systems fail. Whether or not we will have people writing OO code 20 years from now, we'll still need developers who can see the edges of their systems and analysts who can see their box-and-line diagrams for the abstractions they really are.

Sunday, August 12, 2012

The Physical and the Virtual

lol

The Internet is awesome--that's why everyone wants to be there. It's got cheap stuff, lolcats, all kinds of games you can play with your friends, news about even the most mundane things, and every conceivable way of 'talking' to other people without having to deal with them face to face. Did I mention that it's global and instantaneous too? Sold!

The Internet has shaped our lives so much that it's easy to become fed up with the real world. Why do I have to sit in traffic when I could telecommute? Why should I go to the store when I can buy online and have it the next day? Why should I learn anything when I can look it up on my phone? Why should I have to slog away at my job when kids are making millions selling apps? Why should I put up with a girlfriend when I can... well...

In short, why should I do anything that doesn't give me immediate satisfaction?

The shit flies when our speed-of-light connection is lost. I've been a cord cutter for a few years now. I watch less television, I don't pay the big cable companies quite so much, and I've learned to deal with the lag between air time and online availability. But, every so often, the cable goes out and I suddenly have no idea how to function. "What are we supposed to do?" I say.

Don't cut that line or we'll lose New Zealand!

It's times like these I remember that the virtual world can't be separated from the physical world. Even as a developer with a (albeit limited) knowledge of networking, it's so easy to forget the routers, cables, protocols, and standards that make it all work. We're abstracted from all that.

The story of the physical Internet is fascinating, and Andy Blum tells it well in his new book. Reading it, I remembered all the things I should have known already--like that the Internet is a network of networks, so there have to be places where these networks connect to each other. (Duh.) These stuffy closets and unmarked office buildings that Blum describes are where the Internet is, if it can be said to be anywhere. There's a whole other world that underlies our virtual 'reality.'

A subway map shows how to get between points above ground,
not what happens below

During the summers of my youth I ran cables for the school district, among less glamorous tasks. It was a shock when I first popped up the drop ceiling at my high school and realized that there was a huge space between the tiled 'ceiling' and the floor above. This in-between space is full of conduits, plumbing, cables, HVAC systems, and meters that make our comfortable lives possible.

Being abstracted from the guts of the system is good. It lets us deal with the things we care about, like learning, while ignoring other things, like trying to keep at a comfortable temperature. Supermarkets abstract us from the food chain so we can focus on cooking. Virtualized desktops and servers abstract us from physical hardware so we can focus on computing. High level languages abstract us from their implementation so we can focus on coding.

But abstraction has a cost. When the AC breaks, the tomato crop is ruined by disease, the network is down, or the compiler does something really stupid, we're hosed. We can't learn, cook, compute, or program.

I don't think there's some big moral to this story. We obviously can't get rid of HVAC, supermarkets, or OO languages. (Right?) But maybe it's a good idea to understand these physical worlds underlying our virtual ones... before they break.